Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Analysis
  • Published:

A HUPO test sample study reveals common problems in mass spectrometry–based proteomics

A Corrigendum to this article was published on 01 July 2009

This article has been updated


We performed a test sample study to try to identify errors leading to irreproducibility, including incompleteness of peptide sampling, in liquid chromatography–mass spectrometry–based proteomics. We distributed an equimolar test sample, comprising 20 highly purified recombinant human proteins, to 27 laboratories. Each protein contained one or more unique tryptic peptides of 1,250 Da to test for ion selection and sampling in the mass spectrometer. Of the 27 labs, members of only 7 labs initially reported all 20 proteins correctly, and members of only 1 lab reported all tryptic peptides of 1,250 Da. Centralized analysis of the raw data, however, revealed that all 20 proteins and most of the 1,250 Da peptides had been detected in all 27 labs. Our centralized analysis determined missed identifications (false negatives), environmental contamination, database matching and curation of protein identifications as sources of problems. Improved search engines and databases are needed for mass spectrometry–based proteomics.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Number of tandem mass spectra assigned to tryptic peptides.
Figure 2: Discrepancies between reported data and centralized analysis identify erroneous reporting.

Similar content being viewed by others

Change history

  • 29 June 2009

    NOTE: In the version of this article initially published, the author name Steven A Carr was spelled incorrectly, and the name of an organization described in the text, the HUPO Proteomics Standards Initiative (PSI), was given incorrectly. These errors have been corrected in the PDF and HTML versions of this article.


  1. de Godoy, L.M. et al. Comprehensive mass-spectrometry-based proteome quantification of haploid versus diploid yeast. Nature 455, 1251–1254 (2008).

    Article  CAS  Google Scholar 

  2. Turck, C.W. et al. The Association of Biomolecular Resource Facilities Proteomics Research Group 2006 study: relative protein quantitation. Mol. Cell. Proteomics 6, 1291–1298 (2007).

    Article  CAS  Google Scholar 

  3. Boutilier, K. et al. Comparison of different search engines using validated MS/MS test datasets. Anal. Chim. Acta 534, 11–20 (2005).

    Article  CAS  Google Scholar 

  4. Elias, J.E., Haas, W., Faherty, B.K. & Gygi, S.P. Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations. Nat. Methods 2, 667–675 (2005).

    Article  CAS  Google Scholar 

  5. Kapp, E.A. et al. An evaluation, comparison, and accurate benchmarking of several publicly available MS/MS search algorithms: sensitivity and specificity analysis. Proteomics 5, 3475–3490 (2005).

    Article  CAS  Google Scholar 

  6. Bell, A.W., Nilsson, T., Kearney, R.E. & Bergeron, J.J. The protein microscope: incorporating mass spectrometry into cell biology. Nat. Methods 4, 783–784 (2007).

    Article  CAS  Google Scholar 

  7. Gilchrist, A. et al. Quantitative proteomics analysis of the secretory pathway. Cell 127, 1265–1281 (2006).

    Article  CAS  Google Scholar 

  8. Klie, S. et al. Analyzing large-scale proteomics projects with latent semantic indexing. J. Proteome Res. 7, 182–191 (2008).

    Article  CAS  Google Scholar 

  9. Zubarev, R. & Mann, M. On the proper use of mass accuracy in proteomics. Mol. Cell. Proteomics 6, 377–381 (2007).

    Article  CAS  Google Scholar 

  10. Cortez, L. The implementation of accreditation in a chemical laboratory. Trends Analyt. Chem. 18, 638–643 (1999).

    Article  CAS  Google Scholar 

  11. Lander, E.S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).

    Article  CAS  Google Scholar 

  12. Yates, J.R. III., Gilchrist, A., Howell, K.E. & Bergeron, J.J. Proteomics of organelles and large cellular structures. Nat. Rev. Mol. Cell Biol. 6, 702–714 (2005).

    Article  CAS  Google Scholar 

  13. Shi, L., Perkins, R.G., Fang, H. & Tong, W. Reproducible and reliable microarray results through quality control: good laboratory proficiency and appropriate data analysis practices are essential. Curr. Opin. Biotechnol. 19, 10–18 (2008).

    Article  CAS  Google Scholar 

  14. Anonymous. Making the most of microarrays. Nat. Biotechnol. 24, 1039 (2006).

  15. Anonymous. Proteomics' new order. Nature. 437, 169 (2005).

  16. Domon, B. & Aebersold, R. Challenges and opportunities in proteomics data analysis. Mol. Cell. Proteomics 5, 1921–1926 (2006).

    Article  CAS  Google Scholar 

  17. Falkner, J.A., Hill, J.A. & Andrews, P.C. Proteomics FASTA archive and reference resource. Proteomics 8, 1756–1757 (2008).

    Article  CAS  Google Scholar 

  18. Martens, L. et al. PRIDE: the proteomics identifications database. Proteomics 5, 3537–3545 (2005).

    Article  CAS  Google Scholar 

  19. Liang, F. et al. ORFDB: an information resource linking scientific content to a high-quality Open Reading Frame (ORF) collection. Nucleic Acids Res. 32, D595–D599 (2004).

    Article  CAS  Google Scholar 

  20. Strausberg, R.L., Feingold, E.A., Klausner, R.D. & Collins, F.S. The mammalian gene collection. Science 286, 455–457 (1999).

    Article  CAS  Google Scholar 

  21. Craig, R. & Beavis, R.C. TANDEM: matching proteins with tandem mass spectra. Bioinformatics 20, 1466–1467 (2004).

    Article  CAS  Google Scholar 

  22. Keller, A., Eng, J., Zhang, N., Li, X.J. & Aebersold, R. A uniform proteomics MS/MS analysis platform utilizing open XML file formats. Mol. Syst. Biol. 1, 2005.0017 (2005).

    Article  Google Scholar 

  23. Khan, S. et al. Identification of the dominant translation start site in the attB1 sequence of the pET-DEST42 Gateway vector. Protein Expr. Purif. 49, 102–107 (2006).

    Article  CAS  Google Scholar 

  24. Fahnert, B., Lilie, H. & Neubauer, P. Inclusion bodies: formation and utilisation. Adv. Biochem. Eng. Biotechnol. 89, 93–142 (2004).

    CAS  PubMed  Google Scholar 

  25. Carr, S. et al. The need for guidelines in publication of peptide and protein identification data: Working Group on Publication Guidelines for Peptide and Protein Identification Data. Mol. Cell. Proteomics 3, 531–533 (2004).

    Article  CAS  Google Scholar 

  26. Au, C.E. et al. Organellar proteomics to create the cell map. Curr. Opin. Cell Biol. 19, 376–385 (2007).

    Article  CAS  Google Scholar 

  27. Kersey, P.J. et al. The International Protein Index: an integrated database for proteomics experiments. Proteomics 4, 1985–1988 (2004).

    Article  CAS  Google Scholar 

  28. Pedrioli, P.G. et al. A common open representation of mass spectrometry data and its application to proteomics research. Nat. Biotechnol. 22, 1459–1466 (2004).

    Article  CAS  Google Scholar 

  29. Silva, J.C. et al. Quantitative proteomic analysis by accurate mass retention time pairs. Anal. Chem. 77, 2187–2200 (2005).

    Article  CAS  Google Scholar 

  30. MacLean, B., Eng, J.K., Beavis, R.C. & McIntosh, M. General framework for developing and evaluating database scoring algorithms using the TANDEM search engine. Bioinformatics 22, 2830–2832 (2006).

    Article  CAS  Google Scholar 

  31. Keller, A., Nesvizhskii, A.I., Kolker, E. & Aebersold, R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74, 5383–5392 (2002).

    Article  CAS  Google Scholar 

  32. Nesvizhskii, A.I., Keller, A., Kolker, E. & Aebersold, R. A statistical model for identifying proteins by tandem mass spectrometry. Anal. Chem. 75, 4646–4658 (2003).

    Article  CAS  Google Scholar 

Download references


Supported in part by Canadian Institutes of Health Research to the HUPO Head Quarters (S. Ouellette) for coordination of this HUPO test sample initiative. A.W.B. and C.E.A. were supported by Genome Quebec and McGill University. We thank D. Juncker, G. Temple, J. van Oostrum, G. Omenn, K. Colwill, J. Langridge and M. Hallett for their comments on the manuscript, and D.M. Desiderio for helpful comments on the manuscript. This test sample effort builds on pioneering efforts from several other groups and especially Association of Biomolecular Resource Facilities. This study is a HUPO test sample initiative and HUPO welcomes collaborative efforts to benefit proteomics. We acknowledge the following sources of grant support: E.W.D. is supported by the National Heart, Lung and Blood Institute, National Institutes of Health (NIH), under contract N01-HV-28179; the University of California, Los Angeles Burnham Institute for Medical Research NIH grant number RR020843; University of California, Los Angeles (National Heart, Lung and Blood Institute P01-008111); University of Michigan, NIH P41RR018627; Beijing Proteome Research Center, affiliated with The Beijing Institute of Radiation Medicine for National Key Programs for Basic Research grant 2006CB910801 and Hi-Tech Research grant 2006AA02A308. We acknowledge access and use of The University College Dublin Conway Mass Spectrometry Resource instrumentation, supported by Science Foundation, Ireland grant 04/RPI/B499. PRIDE, J.A.V. is a postdoctoral fellow of the “Especialización en Organismos Internacionales” program from the Spanish Ministry of Education and Science. L.M. is supported by the “ProDaC” grant LSHG-CT-2006-036814 of the EU. Samuel Lunenfeld Research Institute, Mount Sinai, Toronto is supported by Genome Canada through Ontario Genomics Institute. J.A.V. and L.M. thank H. Hermjakob and R. Apweiler for their support. A.W.B. thanks L. Roy and Z. Bencsath-Makkai for help in data submission and analysis.

Author information

Authors and Affiliations




A.W.B. coordinated all steps of the study. C.E.A., T.N. and J.J.M.B. coordinated data analysis and the final manuscript. E.W.D., R.B. and R.K. did the centralized analysis of the collective data retrieved from the raw data supplied from each lab to Tranche. S.A.C., P.P., L.M., E.K., C.D., S.S., X.Q., K.W., T.P.C., K.P. and T.A.B. provided comments. Invitrogen prepared, designed and distributed the test sample proteins.

Corresponding author

Correspondence to John J M Bergeron.

Ethics declarations

Competing interests

There is a potential to market the test samples used in this study.

Additional information

A full list of authors appears at the end of this paper.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–11 , Supplementary Tables 1–5 and 7–11, Supplementary Note, Supplementary Methods (PDF 2922 kb)

Supplementary Table 6

Initial results as submitted by 24 academic laboratories and 3 vendors. (XLS 110 kb)

Supplementary Table 12

Final results and repeat analyses as submitted by 24 academic laboratories and 3 vendors. (PDF 136 kb)

Supplementary Table 13

Peptides identified by centralized analysis of the data. (XLS 607 kb)

Supplementary Table 14

Tranche hash and passphrase codes. (XLS 25 kb)

Supplementary Table 15

Proteins identified by centralized analysis of the data. (XLS 177 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bell, A., Deutsch, E., Au, C. et al. A HUPO test sample study reveals common problems in mass spectrometry–based proteomics. Nat Methods 6, 423–430 (2009).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing