A systematic approach to modeling, capturing, and disseminating proteomics experimental data

Taylor, Chris F.; Paton, Norman W.; Garwood, Kevin L.; Kirby, Paul D.; Stead, David A.; Yin, Zhikang; Deutsch, Eric W.; Selway, Laura; Walker, Janet; Riba-Garcia, Isabel; Mohammed, Shabaz; Deery, Michael J.; Howard, Julie A.; Dunkley, Tom; Aebersold, Ruedi; Kell, Douglas B.; Lilley, Kathryn S.; Roepstorff, Peter; Yates, John R.; Brass, Andy; Brown, Alistair J.P.; Cash, Phil; Gaskell, Simon J.; Hubbard, Simon J.; Oliver, Stephen G.

doi:10.1038/nbt0303-247

Perspective
Published: March 2003

A systematic approach to modeling, capturing, and disseminating proteomics experimental data

Chris F. Taylor^1,2,
Norman W. Paton²,
Kevin L. Garwood²,
Paul D. Kirby^1,2,
David A. Stead³,
Zhikang Yin³,
Eric W. Deutsch⁴,
Laura Selway³,
Janet Walker³,
Isabel Riba-Garcia⁵,
Shabaz Mohammed⁵,
Michael J. Deery⁷,
Julie A. Howard⁸,
Tom Dunkley⁸,
Ruedi Aebersold⁴,
Douglas B. Kell⁵,
Kathryn S. Lilley⁸,
Peter Roepstorff⁹,
John R. Yates III¹⁰,
Andy Brass^1,2,
Alistair J.P. Brown³,
Phil Cash³,
Simon J. Gaskell⁵,
Simon J. Hubbard⁶ &
…
Stephen G. Oliver¹

Nature Biotechnology volume 21, pages 247–254 (2003)Cite this article

840 Accesses
196 Citations
3 Altmetric
Metrics details

Abstract

Both the generation and the analysis of proteome data are becoming increasingly widespread, and the field of proteomics is moving incrementally toward high-throughput approaches. Techniques are also increasing in complexity as the relevant technologies evolve. A standard representation of both the methods used and the data generated in proteomics experiments, analogous to that of the MIAME (minimum information about a microarray experiment) guidelines for transcriptomics, and the associated MAGE (microarray gene expression) object model and XML (extensible markup language) implementation, has yet to emerge. This hinders the handling, exchange, and dissemination of proteomics data. Here, we present a UML (unified modeling language) approach to proteomics experimental data, describe XML and SQL (structured query language) implementations of that model, and discuss capture, storage, and dissemination strategies. These make explicit what data might be most usefully captured about proteomics experiments and provide complementary routes toward the implementation of a proteome repository.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Examples of the types of data generated by proteomics experiments.**

**Figure 2: The PEDRo UML class diagram provides a conceptual model of proteomics experiment data, which form the basis for the XML and relational schemas.**

lesSDRF is more: maximizing the value of proteomics data through streamlined metadata annotation

Article Open access 24 October 2023

Tine Claeys, Tim Van Den Bossche, … Lennart Martens

A proteomics sample metadata representation for multiomics integration and big data analysis

Article Open access 06 October 2021

Chengxin Dai, Anja Füllgrabe, … Yasset Perez-Riverol

Simple, efficient and thorough shotgun proteomic analysis with PatternLab V

Article 11 April 2022

Marlon D. M. Santos, Diogo B. Lima, … Paulo C. Carvalho

References

Wilkins, M.R., Williams, K.L., Appel, R.D. & Hochstrasser, D.F. (eds.) Proteome Research: New Frontiers in Functional Genomics (Springer, Berlin, 1997).
Book Google Scholar
Pennington, S.R. & Dunn, M.J. (eds.) Proteomics. From Protein Sequence to Function (BIOS, Oxford, UK, 2001).
Google Scholar
Attwood, T.K. The quest to deduce protein function from sequence: the role of pattern databases. Int. J. Biochem. Cell. Biol. 32, 139–155 (1999).
Article Google Scholar
Oliver, S. Guilt–by–association goes global. Nature 403, 601–603 (2000).
Article CAS Google Scholar
Hoogland, C. et al. The 1999 SWISS–2DPAGE database update. Nucleic Acids Res. 28, 286–288 (2000).
Article CAS Google Scholar
Sanchez, J.C. et al. The mouse SWISS–2DPAGE database: a tool for proteomics study of diabetes and obesity. Proteomics 1, 136–163 (2001).
Article CAS Google Scholar
Brazma, A. et al. Minimum information about a microarray experiment (MIAME)—toward standards for microarray data. Nat. Genet. 29, 365–371 (2001).
Article CAS Google Scholar
Booch, G., Rumbaugh, J. & Jacobson, I. The Unified Modelling Language User Guide (Addison Wesley, Massachusetts, 1997).
Google Scholar
Spellman, P.T. et al. Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol. 3, 0046.1–0046.9 (2002).
Article Google Scholar
Unlu, M. et al. Difference gel electrophoresis: a single gel method for detecting changes in protein extracts. Electrophoresis 18, 2071–2077 (1997).
Article CAS Google Scholar
Gygi, S.P. et al. Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nat. Biotechnol. 17, 994–999 (1999).
Article CAS Google Scholar
Eng, J.K., McCormack, A.L. & Yates, J.R. III An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spec. 5, 976–989 (1994).
Article CAS Google Scholar
Creasy, D.J., Cottrell, D.M., Perkins, J.S. & Pappin, D.N. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551–3567 (1999).
Article Google Scholar
Sidhu, K.S. et al. Bioinformatic assessment of mass spectrometric chemical derivatisation techniques for proteome database searching. Proteomics 1, 1368–1377 (2001).
Article CAS Google Scholar
Mewes, H.W. et al. Overview of the yeast genome. Nature (Suppl.) 387, 7–65 (1997).
PubMed Google Scholar
Keller, A., Nesvizhskii, A.I., Kolker, E. & Aebersold, R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74, 5383–5892 (2002).
Article CAS Google Scholar

Download references

Acknowledgements

Special thanks go to Francesco Brancia, Jenny Ho, and Sandy Yates for their critical appraisal of the Schema at various stages. This work was supported by a grant from the Investigating Gene Function (IGF) Initiative of the Biotechnology & Biological Sciences Research Council to S.G.O., N.W.P., A.B., S.G., S.H., P.C., and A.J.P.B. for the COGEME (Consortium for the Functional Genomics of Microbial Eukaryotes) program. D.B.K. thanks the BBSRC for financial support, also under the IGF initiative. K.L.G. is supported by the North West Regional e-Science centre (ESNW), within the UK eScience Programme. Many people have contributed their advice and expertise to the design of PEDRo, at various meetings formal and otherwise, notably attendees at the 2002 Proteomics Standards Initiative meeting of the Human Proteome Organisation at the European Bioinformatics Institute.

Author information

Authors and Affiliations

School of Biological Sciences, University of Manchester, Oxford Road, Manchester, M13 9PL, UK
Chris F. Taylor, Paul D. Kirby, Andy Brass & Stephen G. Oliver
Department of Computer Science, University of Manchester, Oxford Road, Manchester, M13 9PL, UK
Chris F. Taylor, Norman W. Paton, Kevin L. Garwood, Paul D. Kirby & Andy Brass
Department of Molecular & Cell Biology Institute of Medical Science, Institute of Medical Science, University of Aberdeen, Aberdeen, AB25 2ZF, UK
David A. Stead, Zhikang Yin, Laura Selway, Janet Walker, Alistair J.P. Brown & Phil Cash
Institute for Systems Biology, 1441 N 34th St., Seattle, 98103, Washington
Eric W. Deutsch & Ruedi Aebersold
Department of Chemistry, UMIST, PO Box 88, Manchester, M60 1QD, UK
Isabel Riba-Garcia, Shabaz Mohammed, Douglas B. Kell & Simon J. Gaskell
Department of Biomolecular Sciences, UMIST, PO Box 88, Manchester, M60 1QD, UK
Simon J. Hubbard
Inpharmatica Ltd, 60 Charlotte Street, London, UK
Michael J. Deery
Department of Biochemistry, University of Cambridge, Building O, Downing Site, Cambridge, CB2 1QW, UK
Julie A. Howard, Tom Dunkley & Kathryn S. Lilley
Department of Biochemistry & Molecular Biology, University of Southern Denmark, Campusvej 55, Odense M, DK-5230, Denmark
Peter Roepstorff
Department of Cell Biology, Scripps Clinic & Research Institute, La Jolla, 92037, California
John R. Yates III

Authors

Chris F. Taylor
View author publications
You can also search for this author in PubMed Google Scholar
Norman W. Paton
View author publications
You can also search for this author in PubMed Google Scholar
Kevin L. Garwood
View author publications
You can also search for this author in PubMed Google Scholar
Paul D. Kirby
View author publications
You can also search for this author in PubMed Google Scholar
David A. Stead
View author publications
You can also search for this author in PubMed Google Scholar
Zhikang Yin
View author publications
You can also search for this author in PubMed Google Scholar
Eric W. Deutsch
View author publications
You can also search for this author in PubMed Google Scholar
Laura Selway
View author publications
You can also search for this author in PubMed Google Scholar
Janet Walker
View author publications
You can also search for this author in PubMed Google Scholar
Isabel Riba-Garcia
View author publications
You can also search for this author in PubMed Google Scholar
Shabaz Mohammed
View author publications
You can also search for this author in PubMed Google Scholar
Michael J. Deery
View author publications
You can also search for this author in PubMed Google Scholar
Julie A. Howard
View author publications
You can also search for this author in PubMed Google Scholar
Tom Dunkley
View author publications
You can also search for this author in PubMed Google Scholar
Ruedi Aebersold
View author publications
You can also search for this author in PubMed Google Scholar
Douglas B. Kell
View author publications
You can also search for this author in PubMed Google Scholar
Kathryn S. Lilley
View author publications
You can also search for this author in PubMed Google Scholar
Peter Roepstorff
View author publications
You can also search for this author in PubMed Google Scholar
John R. Yates III
View author publications
You can also search for this author in PubMed Google Scholar
Andy Brass
View author publications
You can also search for this author in PubMed Google Scholar
Alistair J.P. Brown
View author publications
You can also search for this author in PubMed Google Scholar
Phil Cash
View author publications
You can also search for this author in PubMed Google Scholar
Simon J. Gaskell
View author publications
You can also search for this author in PubMed Google Scholar
Simon J. Hubbard
View author publications
You can also search for this author in PubMed Google Scholar
Stephen G. Oliver
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stephen G. Oliver.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Taylor, C., Paton, N., Garwood, K. et al. A systematic approach to modeling, capturing, and disseminating proteomics experimental data. Nat Biotechnol 21, 247–254 (2003). https://doi.org/10.1038/nbt0303-247

Download citation

Received: 03 January 2003
Accepted: 27 January 2003
Issue Date: March 2003
DOI: https://doi.org/10.1038/nbt0303-247

This article is cited by

An integrative top-down and bottom-up qualitative model construction framework for exploration of biochemical systems
- Zujian Wu
- Wei Pang
- George M. Coghill
Soft Computing (2015)
SILEC: a protocol for generating and using isotopically labeled coenzyme A mass spectrometry standards
- Sankha S Basu
- Ian A Blair
Nature Protocols (2012)
A unified framework for managing provenance information in translational research
- Satya S Sahoo
- Vinh Nguyen
- Amit P Sheth
BMC Bioinformatics (2011)
Assembling proteomics data as a prerequisite for the analysis of large scale experiments
- Frank Schmidt
- Monika Schmid
- Peter R Jungblut
Chemistry Central Journal (2009)
An open-source representation for 2-DE-centric proteomics and support infrastructure for data storage and analysis
- Romesh Stanislaus
- John M Arthur
- Jonas S Almeida
BMC Bioinformatics (2008)

A systematic approach to modeling, capturing, and disseminating proteomics experimental data

Abstract

Access options

Similar content being viewed by others

lesSDRF is more: maximizing the value of proteomics data through streamlined metadata annotation

A proteomics sample metadata representation for multiomics integration and big data analysis

Simple, efficient and thorough shotgun proteomic analysis with PatternLab V

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

This article is cited by

An integrative top-down and bottom-up qualitative model construction framework for exploration of biochemical systems

SILEC: a protocol for generating and using isotopically labeled coenzyme A mass spectrometry standards

A unified framework for managing provenance information in translational research

Assembling proteomics data as a prerequisite for the analysis of large scale experiments

An open-source representation for 2-DE-centric proteomics and support infrastructure for data storage and analysis

Search

Quick links

Abstract

Access options

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links