Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Biomedical informatics for proteomics

Abstract

Success in proteomics depends upon careful study design and high-quality biological samples. Advanced information technologies, and also an ability to use existing knowledge to the full, will be crucial in making sense of the data. Despite its genome-scale potential, proteome analysis is at a much earlier stage of development than genomics and gene expression (microarray) studies. Fundamental issues involving biological variability, pre-analytic factors and analytical reproducibility remain to be resolved. Consequently, the analysis of proteomics data is currently informal and relies heavily on expert opinion. Databases and software tools developed for the analysis of molecular sequences and microarrays are helpful, but are limited owing to the unique attributes of proteomics data and differing research goals.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Experimental versus observational study.

Similar content being viewed by others

References

  1. Cambridge Healthtech Institute Conference on Human Proteome Project, 2–4 April 2001, McLean, Virgina 〈http://www.healthtech.com/2001/hpr/index.htm〉 (2001).

  2. Krishna, R. G. & Wold, F. Post-translational modification of proteins. Adv. Enzymol. Relat. Areas Mol. Biol. 67, 265–298 (1993).

    CAS  PubMed  Google Scholar 

  3. Keegan, L. P., Gallo, A. & O'Connell, M. A. The many roles of an RNA editor. Nature Rev. Genet. 2, 869–878 (2001).

    Article  CAS  PubMed  Google Scholar 

  4. Maniatis, T. & Tasic, B. Alternative pre-mRNA splicing and proteome expansion in metazoans. Nature 418, 236–243 (2002).

    Article  ADS  CAS  PubMed  Google Scholar 

  5. Dayhoff, M. O. & Eck, R. V. MASSPEC: a computer program for complete sequence analysis of large proteins from mass spectrometry data of a single sample. Comput. Biol. Med. 1, 5–28 (1970).

    Article  CAS  PubMed  Google Scholar 

  6. Anderson, N. G., Matheson, A. & Anderson, N. L. Back to the future: the human protein index (HPI) and the agenda for post-proteomic biology. Proteomics 1, 3–12 (2001).

    Article  CAS  PubMed  Google Scholar 

  7. Boguski, M. S. Bioinformatics. Curr. Opin. Genet. Dev. 4, 383–388 (1994).

    Article  CAS  PubMed  Google Scholar 

  8. Boguski, M. S. The turning point in genome research. Trends Biochem. Sci. 20, 295–296 (1995).

    Article  CAS  PubMed  Google Scholar 

  9. Zuckerkandl, E. & Pauling, L. Molecules as documents of evolutionary history. J. Theor. Biol. 8, 357–366 (1965).

    Article  CAS  PubMed  Google Scholar 

  10. Dayhoff, M. O. Computer aids to protein sequence determination. J. Theor. Biol. 8, 97–112 (1965).

    Article  CAS  PubMed  Google Scholar 

  11. Doolittle, R. F. Some reflections on the early days of sequence searching. J. Mol. Med. 75, 239–241 (1997).

    CAS  PubMed  Google Scholar 

  12. Shortliffe, E. et al. (eds) Medical Informatics: Computer Applications in Health Care and Biomedicine (Springer, New York, 2000).

    Google Scholar 

  13. Hieter, P. & Boguski, M. Functional genomics: it's all how you read it. Science 278, 601–602 (1997).

    Article  CAS  PubMed  Google Scholar 

  14. Duyk, G. M. Sharper tools and simpler methods. Nature Genet. 32(Chipping Forecast II Suppl.), 465–468 (2002).

    Article  CAS  PubMed  Google Scholar 

  15. Kohane, I. S., Kho, A. T. & Butte, A. J. Microarrays For an Integrative Genomics (Massachusetts Institute of Technology Press, Cambridge, MA, 2003).

    Google Scholar 

  16. Potter, J. D. At the interfaces of epidemiology, genetics and genomics. Nature Rev. Genet. 2, 142–147 (2001).

    Article  CAS  PubMed  Google Scholar 

  17. McClatchey, K. D. (ed.) Clinical Laboratory Medicine (Lippincott, Philadelphia, 2002).

    Google Scholar 

  18. Huang, J. et al. Effects of ischemia on gene expression. J. Surg. Res. 99, 222–227 (2001).

    Article  CAS  PubMed  Google Scholar 

  19. Craven, R. A. & Banks, R. E. Laser capture microdissection and proteomics: possibilities and limitation. Proteomics 1, 1200–1204 (2001).

    Article  CAS  PubMed  Google Scholar 

  20. Craven, R. A. & Banks, R. E. Use of laser capture microdissection to selectively obtain distinct populations of cells for proteomic analysis. Methods Enzymol. 356, 33–49 (2002).

    Article  CAS  PubMed  Google Scholar 

  21. Margolin, J. From comparative and functional genomics to practical decisions in the clinic: a view from the trenches. Genome Res. 11, 923–925 (2001).

    Article  CAS  PubMed  Google Scholar 

  22. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Statist. Soc. B 57, 289–300 (1995).

    MathSciNet  MATH  Google Scholar 

  23. Dayhoff, M. O. & Eck, R. V. Atlas of Protein Sequence and Structure (National Biomedical Research Foundation, Silver Spring, MD, 1966).

    Google Scholar 

  24. Smith, T. F. The history of the genetic sequence databases. Genomics 6, 701–707 (1990).

    Article  CAS  PubMed  Google Scholar 

  25. Bairoch, A. & Boeckmann, B. The SWISS-PROT protein sequence data bank. Nucleic Acids Res. 19 (Suppl.), 2247–2249 (1991).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Maglott, D. R. et al. NCBI's LocusLink and RefSeq. Nucleic Acids Res. 28, 126–128 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genet. 25, 25–29 (2000).

    Article  CAS  PubMed  Google Scholar 

  28. Bader, G. D. et al. BIND—The Biomolecular Interaction Network Database. Nucleic Acids Res. 29, 242–245 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Adkins, J. N. et al. Toward a human blood serum proteome: analysis by multidimensional separation coupled with mass spectrometry. Mol. Cell. Proteomics 1, 947–955 (2002).

    Article  CAS  PubMed  Google Scholar 

  30. Kratz, A. & Lewandrowski, K. B. Case records of the Massachusetts General Hospital. Weekly clinicopathological exercises. Normal reference laboratory values. N. Engl. J. Med. 339, 1063–1072 (1998).

    Article  CAS  PubMed  Google Scholar 

  31. Jung, E. et al. Annotation of glycoproteins in the SWISS-PROT database. Proteomics 1, 262–268 (2001).

    Article  CAS  PubMed  Google Scholar 

  32. Anderson, N. L. & Anderson, N. G. The human plasma proteome: history, character, and diagnostic prospects. Mol. Cell. Proteomics 1, 845–867 (2002).

    Article  CAS  PubMed  Google Scholar 

  33. Chakravarti, D. N., Chakravarti, B. & Moutsatsos, I. Informatic tools for proteome profiling. Biotechniques 32(Comput. Proteomics Suppl.), S4–S15 (2002).

    Article  Google Scholar 

  34. Liebler, D. C. Introduction to Proteomics (Humana, Totowa, NJ, 2002).

    Google Scholar 

  35. The Association of Biomolecular Resource Facilities. Delta Mass: A Database of Protein Post Translational Modifications 〈http://www.abrf.org/index.cfm/dm.home〉 (2002).

  36. Wilkins, M. R. et al. High-throughput mass spectrometric discovery of protein post-translational modifications. J. Mol. Biol. 289, 645–657 (1999).

    Article  CAS  PubMed  Google Scholar 

  37. Creasy, D. M. & Cottrell, J. S. Error tolerant searching of uninterpreted tandem mass spectrometry data. Proteomics 2, 1426–1434 (2002).

    Article  CAS  PubMed  Google Scholar 

  38. Choudhary, J. S. et al. Matching peptide mass spectra to EST and genomic DNA databases. Trends Biotechnol. 19 (Suppl.), S17–S22 (2001).

    Article  CAS  PubMed  Google Scholar 

  39. Choudhary, J. S. et al. Interrogating the human genome using uninterpreted mass spectrometry data. Proteomics 1, 651–667 (2001).

    Article  CAS  PubMed  Google Scholar 

  40. Bafna, V. & Edwards, N. SCOPE: a probabilistic model for scoring tandem mass spectra against a peptide database. Bioinformatics 17 (Suppl.) S13–S21 (2001).

    Article  PubMed  Google Scholar 

  41. Eng, J., McCormack, A. & Yates, J. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5, 976–989 (1994).

    Article  CAS  PubMed  Google Scholar 

  42. Fenyo, D. Identifying the proteome: software tools. Curr. Opin. Biotechnol. 11, 391–395 (2000).

    Article  CAS  PubMed  Google Scholar 

  43. Field, H. I., Fenyo, D. & Beavis, R. C. RADARS, a bioinformatics solution that automates proteome mass spectral analysis, optimises protein identification, and archives data in a relational database. Proteomics 2, 36–47 (2002).

    Article  CAS  PubMed  Google Scholar 

  44. Perkins, D. N. et al. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551–3567 (1999).

    Article  CAS  PubMed  Google Scholar 

  45. Efron, B. & Tibshirani, R. Empirical Bayes methods and false discovery rates for microarrays. Genet. Epidemiol. 23, 70–86 (2002).

    Article  PubMed  Google Scholar 

  46. Pepe, M. S. et al. Selecting differentially expressed genes from microarray experiments. Biometrics (in the press).

  47. Keller, A. et al. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74, 5383–5392 (2002).

    Article  CAS  PubMed  Google Scholar 

  48. Adam, B. L. et al. Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. Cancer Res. 62, 3609–3614 (2002).

    CAS  PubMed  Google Scholar 

  49. Petricoin, E. F. III et al. Serum proteomic patterns for detection of prostate cancer. J. Natl Cancer Inst. 94, 1576–1578 (2002).

    Article  CAS  PubMed  Google Scholar 

  50. Petricoin, E. F. et al. Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359, 572–577 (2002).

    Article  CAS  PubMed  Google Scholar 

  51. Qu, Y. et al. Boosted decision tree analysis of surface-enhanced laser desorption/ionization mass spectral serum profiles discriminates prostate cancer from noncancer patients. Clin. Chem. 48, 1835–1843 (2002).

    CAS  PubMed  Google Scholar 

  52. Pepe, M. S. et al. Phases of biomarker development for early detection of cancer. J. Natl Cancer Inst. 93, 1054–1061 (2001).

    Article  CAS  PubMed  Google Scholar 

  53. Judson, H. The Eighth Day of Creation: Makers of the Revolution in Biology expand. edn (Cold Spring Harbor Laboratory Press, New York, 1996)

    Google Scholar 

  54. Hayles, N. How We Became Posthuman: Virtual Bodies in Cybernetics, Literature, and Informatics (University of Chicago Press, Chicago, 1999).

    Book  Google Scholar 

  55. Bonini, P. et al. Errors in laboratory medicine. Clin. Chem. 48, 691–698 (2002).

    CAS  PubMed  Google Scholar 

  56. Narayanan, S. The preanalytic phase. An important component of laboratory medicine. Am. J. Clin. Pathol. 113, 429–452 (2000).

    Article  CAS  PubMed  Google Scholar 

  57. Spellman, P. T. et al. Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol. 3, 46 (2002).

    Article  Google Scholar 

  58. Brazma, A. et al. Minimum information about a microarray experiment (MIAME)—toward standards for microarray data. Nature Genet. 29, 365–371 (2001).

    Article  CAS  PubMed  Google Scholar 

  59. Editorial. Coming to terms with microarrays. Nature Genet. 32, 333–334 (2002).

  60. Ball, C. et al. Standards for Microarray Data. Science 298, 539 (2002).

    Article  CAS  PubMed  Google Scholar 

  61. Orchard, S., Kersey, P., Hermjakob, H. & Apweiler, R. The HUPO proteomics standards initiative meeting: towards common standards for exchanging proteomics data. Comp. Funct. Genom. 4, 16–19 (2003).

    Article  CAS  Google Scholar 

  62. Bader, G. D. & Hogue, C. W. BIND—a data specification for storing and describing biomolecular interactions, molecular complexes and pathways. Bioinformatics 16, 465–477 (2000).

    Article  CAS  PubMed  Google Scholar 

  63. Abiteboul, S., Buneman, P. & Suciu, D. Data on the Web: From Relations to Semistructured Data and XML (Morgan Kaufmann, San Francisco, 2000).

    Google Scholar 

  64. Coyle, F. XML, Web Services, and the Data Revolution (Addison-Wesley, Boston, 2002).

    Google Scholar 

Download references

Acknowledgements

We thank L. Hartwell, J. Potter and G. Omenn for stimulating discussions and J. Gray, J. Pounds and L. Geer for valuable suggestions and critical readings of the manuscript.

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Boguski, M., McIntosh, M. Biomedical informatics for proteomics. Nature 422, 233–237 (2003). https://doi.org/10.1038/nature01515

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1038/nature01515

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing