Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

A systematic approach to modeling, capturing, and disseminating proteomics experimental data


Both the generation and the analysis of proteome data are becoming increasingly widespread, and the field of proteomics is moving incrementally toward high-throughput approaches. Techniques are also increasing in complexity as the relevant technologies evolve. A standard representation of both the methods used and the data generated in proteomics experiments, analogous to that of the MIAME (minimum information about a microarray experiment) guidelines for transcriptomics, and the associated MAGE (microarray gene expression) object model and XML (extensible markup language) implementation, has yet to emerge. This hinders the handling, exchange, and dissemination of proteomics data. Here, we present a UML (unified modeling language) approach to proteomics experimental data, describe XML and SQL (structured query language) implementations of that model, and discuss capture, storage, and dissemination strategies. These make explicit what data might be most usefully captured about proteomics experiments and provide complementary routes toward the implementation of a proteome repository.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Examples of the types of data generated by proteomics experiments.
Figure 2: The PEDRo UML class diagram provides a conceptual model of proteomics experiment data, which form the basis for the XML and relational schemas.
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8


  1. Wilkins, M.R., Williams, K.L., Appel, R.D. & Hochstrasser, D.F. (eds.) Proteome Research: New Frontiers in Functional Genomics (Springer, Berlin, 1997).

    Book  Google Scholar 

  2. Pennington, S.R. & Dunn, M.J. (eds.) Proteomics. From Protein Sequence to Function (BIOS, Oxford, UK, 2001).

    Google Scholar 

  3. Attwood, T.K. The quest to deduce protein function from sequence: the role of pattern databases. Int. J. Biochem. Cell. Biol. 32, 139–155 (1999).

    Article  Google Scholar 

  4. Oliver, S. Guilt–by–association goes global. Nature 403, 601–603 (2000).

    CAS  Article  Google Scholar 

  5. Hoogland, C. et al. The 1999 SWISS–2DPAGE database update. Nucleic Acids Res. 28, 286–288 (2000).

    CAS  Article  Google Scholar 

  6. Sanchez, J.C. et al. The mouse SWISS–2DPAGE database: a tool for proteomics study of diabetes and obesity. Proteomics 1, 136–163 (2001).

    CAS  Article  Google Scholar 

  7. Brazma, A. et al. Minimum information about a microarray experiment (MIAME)—toward standards for microarray data. Nat. Genet. 29, 365–371 (2001).

    CAS  Article  Google Scholar 

  8. Booch, G., Rumbaugh, J. & Jacobson, I. The Unified Modelling Language User Guide (Addison Wesley, Massachusetts, 1997).

    Google Scholar 

  9. Spellman, P.T. et al. Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol. 3, 0046.1–0046.9 (2002).

    Article  Google Scholar 

  10. Unlu, M. et al. Difference gel electrophoresis: a single gel method for detecting changes in protein extracts. Electrophoresis 18, 2071–2077 (1997).

    CAS  Article  Google Scholar 

  11. Gygi, S.P. et al. Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nat. Biotechnol. 17, 994–999 (1999).

    CAS  Article  Google Scholar 

  12. Eng, J.K., McCormack, A.L. & Yates, J.R. III An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spec. 5, 976–989 (1994).

    CAS  Article  Google Scholar 

  13. Creasy, D.J., Cottrell, D.M., Perkins, J.S. & Pappin, D.N. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551–3567 (1999).

    Article  Google Scholar 

  14. Sidhu, K.S. et al. Bioinformatic assessment of mass spectrometric chemical derivatisation techniques for proteome database searching. Proteomics 1, 1368–1377 (2001).

    CAS  Article  Google Scholar 

  15. Mewes, H.W. et al. Overview of the yeast genome. Nature (Suppl.) 387, 7–65 (1997).

    PubMed  Google Scholar 

  16. Keller, A., Nesvizhskii, A.I., Kolker, E. & Aebersold, R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74, 5383–5892 (2002).

    CAS  Article  Google Scholar 

Download references


Special thanks go to Francesco Brancia, Jenny Ho, and Sandy Yates for their critical appraisal of the Schema at various stages. This work was supported by a grant from the Investigating Gene Function (IGF) Initiative of the Biotechnology & Biological Sciences Research Council to S.G.O., N.W.P., A.B., S.G., S.H., P.C., and A.J.P.B. for the COGEME (Consortium for the Functional Genomics of Microbial Eukaryotes) program. D.B.K. thanks the BBSRC for financial support, also under the IGF initiative. K.L.G. is supported by the North West Regional e-Science centre (ESNW), within the UK eScience Programme. Many people have contributed their advice and expertise to the design of PEDRo, at various meetings formal and otherwise, notably attendees at the 2002 Proteomics Standards Initiative meeting of the Human Proteome Organisation at the European Bioinformatics Institute.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Stephen G. Oliver.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Taylor, C., Paton, N., Garwood, K. et al. A systematic approach to modeling, capturing, and disseminating proteomics experimental data. Nat Biotechnol 21, 247–254 (2003).

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI:

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing