Biomedical informatics for proteomics

Boguski, Mark S.; McIntosh, Martin W.

doi:10.1038/nature01515

Review Article
Published: 13 March 2003

Biomedical informatics for proteomics

Mark S. Boguski¹ &
Martin W. McIntosh²

Nature volume 422, pages 233–237 (2003)Cite this article

4246 Accesses
150 Citations
6 Altmetric
Metrics details

Abstract

Success in proteomics depends upon careful study design and high-quality biological samples. Advanced information technologies, and also an ability to use existing knowledge to the full, will be crucial in making sense of the data. Despite its genome-scale potential, proteome analysis is at a much earlier stage of development than genomics and gene expression (microarray) studies. Fundamental issues involving biological variability, pre-analytic factors and analytical reproducibility remain to be resolved. Consequently, the analysis of proteomics data is currently informal and relies heavily on expert opinion. Databases and software tools developed for the analysis of molecular sequences and microarrays are helpful, but are limited owing to the unique attributes of proteomics data and differing research goals.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Experimental versus observational study.**

Simple, efficient and thorough shotgun proteomic analysis with PatternLab V

Article 11 April 2022

Marlon D. M. Santos, Diogo B. Lima, … Paulo C. Carvalho

High-throughput proteomics: a methodological mini-review

Article 03 August 2022

Miao Cui, Chao Cheng & Lanjing Zhang

Best practices and benchmarks for intact protein analysis for top-down mass spectrometry

Article Open access 27 June 2019

Daniel P. Donnelly, Catherine M. Rawlins, … Jeffrey N. Agar

References

Cambridge Healthtech Institute Conference on Human Proteome Project, 2–4 April 2001, McLean, Virgina 〈http://www.healthtech.com/2001/hpr/index.htm〉 (2001).
Krishna, R. G. & Wold, F. Post-translational modification of proteins. Adv. Enzymol. Relat. Areas Mol. Biol. 67, 265–298 (1993).
CAS PubMed Google Scholar
Keegan, L. P., Gallo, A. & O'Connell, M. A. The many roles of an RNA editor. Nature Rev. Genet. 2, 869–878 (2001).
Article CAS PubMed Google Scholar
Maniatis, T. & Tasic, B. Alternative pre-mRNA splicing and proteome expansion in metazoans. Nature 418, 236–243 (2002).
Article ADS CAS PubMed Google Scholar
Dayhoff, M. O. & Eck, R. V. MASSPEC: a computer program for complete sequence analysis of large proteins from mass spectrometry data of a single sample. Comput. Biol. Med. 1, 5–28 (1970).
Article CAS PubMed Google Scholar
Anderson, N. G., Matheson, A. & Anderson, N. L. Back to the future: the human protein index (HPI) and the agenda for post-proteomic biology. Proteomics 1, 3–12 (2001).
Article CAS PubMed Google Scholar
Boguski, M. S. Bioinformatics. Curr. Opin. Genet. Dev. 4, 383–388 (1994).
Article CAS PubMed Google Scholar
Boguski, M. S. The turning point in genome research. Trends Biochem. Sci. 20, 295–296 (1995).
Article CAS PubMed Google Scholar
Zuckerkandl, E. & Pauling, L. Molecules as documents of evolutionary history. J. Theor. Biol. 8, 357–366 (1965).
Article CAS PubMed Google Scholar
Dayhoff, M. O. Computer aids to protein sequence determination. J. Theor. Biol. 8, 97–112 (1965).
Article CAS PubMed Google Scholar
Doolittle, R. F. Some reflections on the early days of sequence searching. J. Mol. Med. 75, 239–241 (1997).
CAS PubMed Google Scholar
Shortliffe, E. et al. (eds) Medical Informatics: Computer Applications in Health Care and Biomedicine (Springer, New York, 2000).
Google Scholar
Hieter, P. & Boguski, M. Functional genomics: it's all how you read it. Science 278, 601–602 (1997).
Article CAS PubMed Google Scholar
Duyk, G. M. Sharper tools and simpler methods. Nature Genet. 32(Chipping Forecast II Suppl.), 465–468 (2002).
Article CAS PubMed Google Scholar
Kohane, I. S., Kho, A. T. & Butte, A. J. Microarrays For an Integrative Genomics (Massachusetts Institute of Technology Press, Cambridge, MA, 2003).
Google Scholar
Potter, J. D. At the interfaces of epidemiology, genetics and genomics. Nature Rev. Genet. 2, 142–147 (2001).
Article CAS PubMed Google Scholar
McClatchey, K. D. (ed.) Clinical Laboratory Medicine (Lippincott, Philadelphia, 2002).
Google Scholar
Huang, J. et al. Effects of ischemia on gene expression. J. Surg. Res. 99, 222–227 (2001).
Article CAS PubMed Google Scholar
Craven, R. A. & Banks, R. E. Laser capture microdissection and proteomics: possibilities and limitation. Proteomics 1, 1200–1204 (2001).
Article CAS PubMed Google Scholar
Craven, R. A. & Banks, R. E. Use of laser capture microdissection to selectively obtain distinct populations of cells for proteomic analysis. Methods Enzymol. 356, 33–49 (2002).
Article CAS PubMed Google Scholar
Margolin, J. From comparative and functional genomics to practical decisions in the clinic: a view from the trenches. Genome Res. 11, 923–925 (2001).
Article CAS PubMed Google Scholar
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Statist. Soc. B 57, 289–300 (1995).
MathSciNet MATH Google Scholar
Dayhoff, M. O. & Eck, R. V. Atlas of Protein Sequence and Structure (National Biomedical Research Foundation, Silver Spring, MD, 1966).
Google Scholar
Smith, T. F. The history of the genetic sequence databases. Genomics 6, 701–707 (1990).
Article CAS PubMed Google Scholar
Bairoch, A. & Boeckmann, B. The SWISS-PROT protein sequence data bank. Nucleic Acids Res. 19 (Suppl.), 2247–2249 (1991).
Article CAS PubMed PubMed Central Google Scholar
Maglott, D. R. et al. NCBI's LocusLink and RefSeq. Nucleic Acids Res. 28, 126–128 (2000).
Article CAS PubMed PubMed Central Google Scholar
Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genet. 25, 25–29 (2000).
Article CAS PubMed Google Scholar
Bader, G. D. et al. BIND—The Biomolecular Interaction Network Database. Nucleic Acids Res. 29, 242–245 (2001).
Article CAS PubMed PubMed Central Google Scholar
Adkins, J. N. et al. Toward a human blood serum proteome: analysis by multidimensional separation coupled with mass spectrometry. Mol. Cell. Proteomics 1, 947–955 (2002).
Article CAS PubMed Google Scholar
Kratz, A. & Lewandrowski, K. B. Case records of the Massachusetts General Hospital. Weekly clinicopathological exercises. Normal reference laboratory values. N. Engl. J. Med. 339, 1063–1072 (1998).
Article CAS PubMed Google Scholar
Jung, E. et al. Annotation of glycoproteins in the SWISS-PROT database. Proteomics 1, 262–268 (2001).
Article CAS PubMed Google Scholar
Anderson, N. L. & Anderson, N. G. The human plasma proteome: history, character, and diagnostic prospects. Mol. Cell. Proteomics 1, 845–867 (2002).
Article CAS PubMed Google Scholar
Chakravarti, D. N., Chakravarti, B. & Moutsatsos, I. Informatic tools for proteome profiling. Biotechniques 32(Comput. Proteomics Suppl.), S4–S15 (2002).
Article Google Scholar
Liebler, D. C. Introduction to Proteomics (Humana, Totowa, NJ, 2002).
Google Scholar
The Association of Biomolecular Resource Facilities. Delta Mass: A Database of Protein Post Translational Modifications 〈http://www.abrf.org/index.cfm/dm.home〉 (2002).
Wilkins, M. R. et al. High-throughput mass spectrometric discovery of protein post-translational modifications. J. Mol. Biol. 289, 645–657 (1999).
Article CAS PubMed Google Scholar
Creasy, D. M. & Cottrell, J. S. Error tolerant searching of uninterpreted tandem mass spectrometry data. Proteomics 2, 1426–1434 (2002).
Article CAS PubMed Google Scholar
Choudhary, J. S. et al. Matching peptide mass spectra to EST and genomic DNA databases. Trends Biotechnol. 19 (Suppl.), S17–S22 (2001).
Article CAS PubMed Google Scholar
Choudhary, J. S. et al. Interrogating the human genome using uninterpreted mass spectrometry data. Proteomics 1, 651–667 (2001).
Article CAS PubMed Google Scholar
Bafna, V. & Edwards, N. SCOPE: a probabilistic model for scoring tandem mass spectra against a peptide database. Bioinformatics 17 (Suppl.) S13–S21 (2001).
Article PubMed Google Scholar
Eng, J., McCormack, A. & Yates, J. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5, 976–989 (1994).
Article CAS PubMed Google Scholar
Fenyo, D. Identifying the proteome: software tools. Curr. Opin. Biotechnol. 11, 391–395 (2000).
Article CAS PubMed Google Scholar
Field, H. I., Fenyo, D. & Beavis, R. C. RADARS, a bioinformatics solution that automates proteome mass spectral analysis, optimises protein identification, and archives data in a relational database. Proteomics 2, 36–47 (2002).
Article CAS PubMed Google Scholar
Perkins, D. N. et al. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551–3567 (1999).
Article CAS PubMed Google Scholar
Efron, B. & Tibshirani, R. Empirical Bayes methods and false discovery rates for microarrays. Genet. Epidemiol. 23, 70–86 (2002).
Article PubMed Google Scholar
Pepe, M. S. et al. Selecting differentially expressed genes from microarray experiments. Biometrics (in the press).
Keller, A. et al. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74, 5383–5392 (2002).
Article CAS PubMed Google Scholar
Adam, B. L. et al. Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. Cancer Res. 62, 3609–3614 (2002).
CAS PubMed Google Scholar
Petricoin, E. F. III et al. Serum proteomic patterns for detection of prostate cancer. J. Natl Cancer Inst. 94, 1576–1578 (2002).
Article CAS PubMed Google Scholar
Petricoin, E. F. et al. Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359, 572–577 (2002).
Article CAS PubMed Google Scholar
Qu, Y. et al. Boosted decision tree analysis of surface-enhanced laser desorption/ionization mass spectral serum profiles discriminates prostate cancer from noncancer patients. Clin. Chem. 48, 1835–1843 (2002).
CAS PubMed Google Scholar
Pepe, M. S. et al. Phases of biomarker development for early detection of cancer. J. Natl Cancer Inst. 93, 1054–1061 (2001).
Article CAS PubMed Google Scholar
Judson, H. The Eighth Day of Creation: Makers of the Revolution in Biology expand. edn (Cold Spring Harbor Laboratory Press, New York, 1996)
Google Scholar
Hayles, N. How We Became Posthuman: Virtual Bodies in Cybernetics, Literature, and Informatics (University of Chicago Press, Chicago, 1999).
Book Google Scholar
Bonini, P. et al. Errors in laboratory medicine. Clin. Chem. 48, 691–698 (2002).
CAS PubMed Google Scholar
Narayanan, S. The preanalytic phase. An important component of laboratory medicine. Am. J. Clin. Pathol. 113, 429–452 (2000).
Article CAS PubMed Google Scholar
Spellman, P. T. et al. Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol. 3, 46 (2002).
Article Google Scholar
Brazma, A. et al. Minimum information about a microarray experiment (MIAME)—toward standards for microarray data. Nature Genet. 29, 365–371 (2001).
Article CAS PubMed Google Scholar
Editorial. Coming to terms with microarrays. Nature Genet. 32, 333–334 (2002).
Ball, C. et al. Standards for Microarray Data. Science 298, 539 (2002).
Article CAS PubMed Google Scholar
Orchard, S., Kersey, P., Hermjakob, H. & Apweiler, R. The HUPO proteomics standards initiative meeting: towards common standards for exchanging proteomics data. Comp. Funct. Genom. 4, 16–19 (2003).
Article CAS Google Scholar
Bader, G. D. & Hogue, C. W. BIND—a data specification for storing and describing biomolecular interactions, molecular complexes and pathways. Bioinformatics 16, 465–477 (2000).
Article CAS PubMed Google Scholar
Abiteboul, S., Buneman, P. & Suciu, D. Data on the Web: From Relations to Semistructured Data and XML (Morgan Kaufmann, San Francisco, 2000).
Google Scholar
Coyle, F. XML, Web Services, and the Data Revolution (Addison-Wesley, Boston, 2002).
Google Scholar

Download references

Acknowledgements

We thank L. Hartwell, J. Potter and G. Omenn for stimulating discussions and J. Gray, J. Pounds and L. Geer for valuable suggestions and critical readings of the manuscript.

Author information

Authors and Affiliations

Human Biology Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue North, PO Box 19024, Seattle, 98109, Washington, USA
Mark S. Boguski
Public Health Sciences Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue North, PO Box 19024, Seattle, 98109, Washington, USA
Martin W. McIntosh

Authors

Mark S. Boguski
View author publications
You can also search for this author in PubMed Google Scholar
Martin W. McIntosh
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Boguski, M., McIntosh, M. Biomedical informatics for proteomics. Nature 422, 233–237 (2003). https://doi.org/10.1038/nature01515

Download citation

Issue Date: 13 March 2003
DOI: https://doi.org/10.1038/nature01515

This article is cited by

Nucleic acid aptamers for clinical diagnosis: cell detection and molecular imaging
- Boonchoy Soontornworajit
- Yong Wang
Analytical and Bioanalytical Chemistry (2011)
Open Biomedical Ontology-based Medline exploration
- Weijian Xuan
- Manhong Dai
- Fan Meng
BMC Bioinformatics (2009)
A national clinical decision support infrastructure to enable the widespread and consistent practice of genomic and personalized medicine
- Kensaku Kawamoto
- David F Lobach
- Geoffrey S Ginsburg
BMC Medical Informatics and Decision Making (2009)
Evaluation of a joint Bioinformatics and Medical Informatics international course in Peru
- Walter H Curioso
- Jacquelyn R Hansen
- Ann Marie Kimball
BMC Medical Education (2008)
Pathogen profiling for disease management and surveillance
- Vitali Sintchenko
- Jonathan R. Iredell
- Gwendolyn L. Gilbert
Nature Reviews Microbiology (2007)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Biomedical informatics for proteomics

Abstract

Access options

Similar content being viewed by others

Simple, efficient and thorough shotgun proteomic analysis with PatternLab V

High-throughput proteomics: a methodological mini-review

Best practices and benchmarks for intact protein analysis for top-down mass spectrometry

References

Acknowledgements

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

This article is cited by

Nucleic acid aptamers for clinical diagnosis: cell detection and molecular imaging

Open Biomedical Ontology-based Medline exploration

A national clinical decision support infrastructure to enable the widespread and consistent practice of genomic and personalized medicine

Evaluation of a joint Bioinformatics and Medical Informatics international course in Peru

Pathogen profiling for disease management and surveillance

Comments

Search

Quick links

Abstract

Access options

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links