Abstract
We performed a test sample study to try to identify errors leading to irreproducibility, including incompleteness of peptide sampling, in liquid chromatography–mass spectrometry–based proteomics. We distributed an equimolar test sample, comprising 20 highly purified recombinant human proteins, to 27 laboratories. Each protein contained one or more unique tryptic peptides of 1,250 Da to test for ion selection and sampling in the mass spectrometer. Of the 27 labs, members of only 7 labs initially reported all 20 proteins correctly, and members of only 1 lab reported all tryptic peptides of 1,250 Da. Centralized analysis of the raw data, however, revealed that all 20 proteins and most of the 1,250 Da peptides had been detected in all 27 labs. Our centralized analysis determined missed identifications (false negatives), environmental contamination, database matching and curation of protein identifications as sources of problems. Improved search engines and databases are needed for mass spectrometry–based proteomics.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Measurements of heterogeneity in proteomics analysis of the nanoparticle protein corona across core facilities
Nature Communications Open Access 03 November 2022
-
KSTAR: An algorithm to predict patient-specific kinase activities from phosphoproteomic data
Nature Communications Open Access 25 July 2022
-
Critical Assessment of MetaProteome Investigation (CAMPI): a multi-laboratory comparison of established workflows
Nature Communications Open Access 15 December 2021
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout


Change history
29 June 2009
NOTE: In the version of this article initially published, the author name Steven A Carr was spelled incorrectly, and the name of an organization described in the text, the HUPO Proteomics Standards Initiative (PSI), was given incorrectly. These errors have been corrected in the PDF and HTML versions of this article.
References
de Godoy, L.M. et al. Comprehensive mass-spectrometry-based proteome quantification of haploid versus diploid yeast. Nature 455, 1251–1254 (2008).
Turck, C.W. et al. The Association of Biomolecular Resource Facilities Proteomics Research Group 2006 study: relative protein quantitation. Mol. Cell. Proteomics 6, 1291–1298 (2007).
Boutilier, K. et al. Comparison of different search engines using validated MS/MS test datasets. Anal. Chim. Acta 534, 11–20 (2005).
Elias, J.E., Haas, W., Faherty, B.K. & Gygi, S.P. Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations. Nat. Methods 2, 667–675 (2005).
Kapp, E.A. et al. An evaluation, comparison, and accurate benchmarking of several publicly available MS/MS search algorithms: sensitivity and specificity analysis. Proteomics 5, 3475–3490 (2005).
Bell, A.W., Nilsson, T., Kearney, R.E. & Bergeron, J.J. The protein microscope: incorporating mass spectrometry into cell biology. Nat. Methods 4, 783–784 (2007).
Gilchrist, A. et al. Quantitative proteomics analysis of the secretory pathway. Cell 127, 1265–1281 (2006).
Klie, S. et al. Analyzing large-scale proteomics projects with latent semantic indexing. J. Proteome Res. 7, 182–191 (2008).
Zubarev, R. & Mann, M. On the proper use of mass accuracy in proteomics. Mol. Cell. Proteomics 6, 377–381 (2007).
Cortez, L. The implementation of accreditation in a chemical laboratory. Trends Analyt. Chem. 18, 638–643 (1999).
Lander, E.S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
Yates, J.R. III., Gilchrist, A., Howell, K.E. & Bergeron, J.J. Proteomics of organelles and large cellular structures. Nat. Rev. Mol. Cell Biol. 6, 702–714 (2005).
Shi, L., Perkins, R.G., Fang, H. & Tong, W. Reproducible and reliable microarray results through quality control: good laboratory proficiency and appropriate data analysis practices are essential. Curr. Opin. Biotechnol. 19, 10–18 (2008).
Anonymous. Making the most of microarrays. Nat. Biotechnol. 24, 1039 (2006).
Anonymous. Proteomics' new order. Nature. 437, 169 (2005).
Domon, B. & Aebersold, R. Challenges and opportunities in proteomics data analysis. Mol. Cell. Proteomics 5, 1921–1926 (2006).
Falkner, J.A., Hill, J.A. & Andrews, P.C. Proteomics FASTA archive and reference resource. Proteomics 8, 1756–1757 (2008).
Martens, L. et al. PRIDE: the proteomics identifications database. Proteomics 5, 3537–3545 (2005).
Liang, F. et al. ORFDB: an information resource linking scientific content to a high-quality Open Reading Frame (ORF) collection. Nucleic Acids Res. 32, D595–D599 (2004).
Strausberg, R.L., Feingold, E.A., Klausner, R.D. & Collins, F.S. The mammalian gene collection. Science 286, 455–457 (1999).
Craig, R. & Beavis, R.C. TANDEM: matching proteins with tandem mass spectra. Bioinformatics 20, 1466–1467 (2004).
Keller, A., Eng, J., Zhang, N., Li, X.J. & Aebersold, R. A uniform proteomics MS/MS analysis platform utilizing open XML file formats. Mol. Syst. Biol. 1, 2005.0017 (2005).
Khan, S. et al. Identification of the dominant translation start site in the attB1 sequence of the pET-DEST42 Gateway vector. Protein Expr. Purif. 49, 102–107 (2006).
Fahnert, B., Lilie, H. & Neubauer, P. Inclusion bodies: formation and utilisation. Adv. Biochem. Eng. Biotechnol. 89, 93–142 (2004).
Carr, S. et al. The need for guidelines in publication of peptide and protein identification data: Working Group on Publication Guidelines for Peptide and Protein Identification Data. Mol. Cell. Proteomics 3, 531–533 (2004).
Au, C.E. et al. Organellar proteomics to create the cell map. Curr. Opin. Cell Biol. 19, 376–385 (2007).
Kersey, P.J. et al. The International Protein Index: an integrated database for proteomics experiments. Proteomics 4, 1985–1988 (2004).
Pedrioli, P.G. et al. A common open representation of mass spectrometry data and its application to proteomics research. Nat. Biotechnol. 22, 1459–1466 (2004).
Silva, J.C. et al. Quantitative proteomic analysis by accurate mass retention time pairs. Anal. Chem. 77, 2187–2200 (2005).
MacLean, B., Eng, J.K., Beavis, R.C. & McIntosh, M. General framework for developing and evaluating database scoring algorithms using the TANDEM search engine. Bioinformatics 22, 2830–2832 (2006).
Keller, A., Nesvizhskii, A.I., Kolker, E. & Aebersold, R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74, 5383–5392 (2002).
Nesvizhskii, A.I., Keller, A., Kolker, E. & Aebersold, R. A statistical model for identifying proteins by tandem mass spectrometry. Anal. Chem. 75, 4646–4658 (2003).
Acknowledgements
Supported in part by Canadian Institutes of Health Research to the HUPO Head Quarters (S. Ouellette) for coordination of this HUPO test sample initiative. A.W.B. and C.E.A. were supported by Genome Quebec and McGill University. We thank D. Juncker, G. Temple, J. van Oostrum, G. Omenn, K. Colwill, J. Langridge and M. Hallett for their comments on the manuscript, and D.M. Desiderio for helpful comments on the manuscript. This test sample effort builds on pioneering efforts from several other groups and especially Association of Biomolecular Resource Facilities. This study is a HUPO test sample initiative and HUPO welcomes collaborative efforts to benefit proteomics. We acknowledge the following sources of grant support: E.W.D. is supported by the National Heart, Lung and Blood Institute, National Institutes of Health (NIH), under contract N01-HV-28179; the University of California, Los Angeles Burnham Institute for Medical Research NIH grant number RR020843; University of California, Los Angeles (National Heart, Lung and Blood Institute P01-008111); University of Michigan, NIH P41RR018627; Beijing Proteome Research Center, affiliated with The Beijing Institute of Radiation Medicine for National Key Programs for Basic Research grant 2006CB910801 and Hi-Tech Research grant 2006AA02A308. We acknowledge access and use of The University College Dublin Conway Mass Spectrometry Resource instrumentation, supported by Science Foundation, Ireland grant 04/RPI/B499. PRIDE, J.A.V. is a postdoctoral fellow of the “Especialización en Organismos Internacionales” program from the Spanish Ministry of Education and Science. L.M. is supported by the “ProDaC” grant LSHG-CT-2006-036814 of the EU. Samuel Lunenfeld Research Institute, Mount Sinai, Toronto is supported by Genome Canada through Ontario Genomics Institute. J.A.V. and L.M. thank H. Hermjakob and R. Apweiler for their support. A.W.B. thanks L. Roy and Z. Bencsath-Makkai for help in data submission and analysis.
Author information
Authors and Affiliations
Consortia
Contributions
A.W.B. coordinated all steps of the study. C.E.A., T.N. and J.J.M.B. coordinated data analysis and the final manuscript. E.W.D., R.B. and R.K. did the centralized analysis of the collective data retrieved from the raw data supplied from each lab to Tranche. S.A.C., P.P., L.M., E.K., C.D., S.S., X.Q., K.W., T.P.C., K.P. and T.A.B. provided comments. Invitrogen prepared, designed and distributed the test sample proteins.
Corresponding author
Ethics declarations
Competing interests
There is a potential to market the test samples used in this study.
Additional information
A full list of authors appears at the end of this paper.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–11 , Supplementary Tables 1–5 and 7–11, Supplementary Note, Supplementary Methods (PDF 2922 kb)
Supplementary Table 6
Initial results as submitted by 24 academic laboratories and 3 vendors. (XLS 110 kb)
Supplementary Table 12
Final results and repeat analyses as submitted by 24 academic laboratories and 3 vendors. (PDF 136 kb)
Supplementary Table 13
Peptides identified by centralized analysis of the data. (XLS 607 kb)
Supplementary Table 14
Tranche hash and passphrase codes. (XLS 25 kb)
Supplementary Table 15
Proteins identified by centralized analysis of the data. (XLS 177 kb)
Rights and permissions
About this article
Cite this article
Bell, A., Deutsch, E., Au, C. et al. A HUPO test sample study reveals common problems in mass spectrometry–based proteomics. Nat Methods 6, 423–430 (2009). https://doi.org/10.1038/nmeth.1333
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nmeth.1333
This article is cited by
-
Measurements of heterogeneity in proteomics analysis of the nanoparticle protein corona across core facilities
Nature Communications (2022)
-
KSTAR: An algorithm to predict patient-specific kinase activities from phosphoproteomic data
Nature Communications (2022)
-
The Translational Status of Cancer Liquid Biopsies
Regenerative Engineering and Translational Medicine (2021)
-
Data-independent acquisition mass spectrometry for site-specific glycoproteomics characterization of SARS-CoV-2 spike protein
Analytical and Bioanalytical Chemistry (2021)
-
Critical Assessment of MetaProteome Investigation (CAMPI): a multi-laboratory comparison of established workflows
Nature Communications (2021)