The Human Proteome Organization (HUPO) recently completed the first large-scale collaborative study to characterize the human serum and plasma proteomes. The study was carried out in different locations and used diverse methods and instruments to compare and integrate tandem mass spectrometry (MS/MS) data on aliquots of pooled serum and plasma from healthy subjects. Liquid chromatography (LC)-MS/MS data sets from 18 laboratories were matched to the International Protein Index database, and an initial integration exercise resulted in 9,504 proteins identified with one or more peptides, and 3,020 proteins identified with two or more peptides. This article uses a rigorous statistical approach to take into account the length of coding regions in genes, and multiple hypothesis-testing techniques. On this basis, we now present a reduced set of 889 proteins identified with a confidence level of at least 95%. We also discuss the importance of such an integrated analysis in providing an accurate representation of a proteome as well as the value such data sets contain for the high-confidence identification of protein matches to novel exons, some of which may be localized in alternatively spliced forms of known plasma proteins and some in previously nonannotated gene sequences.
Subscribe to Journal
Get full journal access for 1 year
only $21.58 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Mann, M. Mass spectrometry-based proteomics. Nature 422, 198–207 (2003).
Sadygov, R., Cociorva, D. & Yates, J.R. Large-scale database searching using tandem mass spectra: looking up the answer in the back of the book. Nat. Methods 1, 195–202 (2004).
Olsen, J. & Mann, M. Improved peptide identification in proteomics by two consecutive stages of mass spectrometric fragmentation. Proc. Natl. Acad. Sci. USA 101, 13417–13422 (2004).
Orchard, S., Hermjakob, H. & Apweiler, R. Annotating the human proteome. Mol. Cell. Proteomics 4, 435–440 (2005).
Hanash, S. & Celis, J.E. The human proteome organization: a mission to advance proteome knowledge. Mol. Cell. Proteomics 1, 413–414 (2002).
Omenn, G.S. The Human Proteome Organization plasma proteome project pilot phase: reference specimens, technology platform comparisons, and standardized data submissions and analyses. Proteomics 4, 1235–1240 (2004).
Omenn, G.S. et al. Overview of the HUPO Plasma Proteome Project: Results from the pilot phase with 35 collaborating laboratories and multiple analytical groups, generating a core dataset of 3020 proteins and a publicly-available database. Proteomics 5, 3226–3245 (2005).
Kersey, P. et al. The International Protein Index: an integrated database for proteomics experiments. Proteomics 4, 1985–1988 (2004).
Adamski, M. et al. Data management and preliminary data analysis in the pilot phase of the HUPO Plasma Proteome Project. Proteomics 5, 3246–3261 (2005).
Carr, S. et al. The need for guidelines in publication of peptide and protein identification data. Mol. Cell. Proteomics 3, 531–533 (2004).
Cargile, B.J., Bundy, J.L. & Stephenson, J.L. Potential for false positive identifications from large databases through tandem mass spectrometry. J. Proteome Res. 3, 1082–1085 (2004).
Eriksson, J. & Fenyo, D. Protein identification in complex mixtures. J. Proteome Res. 4, 387–393 (2005).
Fenyo, D. & Beavis, R.C. A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes. Anal. Chem. 75, 768–774 (2003).
Keller, A., Nesvizhskii, A.I., Kolker, E. & Aebersold, R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74, 5383–5392 (2002).
Nesvizhskii, A.I., Keller, A., Kolker, E. & Aebersold, R. A statistical model for identifying proteins by tandem mass spectrometry. Anal. Chem. 75, 4646–4658 (2003).
Sadygov, R.G. & Yates, J.R. A hypergeometric probability model for protein identification and validation using tandem mass spectral data and protein sequence databases. Anal. Chem. 75, 3792–3798 (2003).
Shen, Y. et al. Ultra-high-efficiency strong cation exchange LC/RPLC/MS/MS for high dynamic range characterization of the human plasma proteome. Anal. Chem. 76, 1134–1144 (2004).
Perkins, D.N., Pappin, D.J., Creasy, D.M. & Cottrell, J.S. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551–3567 (1999).
Beer, I., Barnea, E., Ziv, T. & Admon, A. Improving large-scale proteomics by clustering of mass spectrometry data. Proteomics 4, 950–960 (2004).
Eng, J.K., McCormack, A.L. & Yates, J.R.I. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5, 976–989 (1994).
Haab, B.B. et al. Immunoassay and antibody microarray analysis of the HUPO reference specimens: systematic variation between sample types and calibration of mass spectrometry data. Proteomics 5, 3278–3291 (2005).
Ishihama, Y. et al. Exponentially modified protein abundance index (emPAI) for estimation of absolute protein amount in proteomics by the number of sequenced peptides per protein. Mol. Cell. Proteomics 4, 1265–1272 (2005).
O'Brien, T.J. et al. The CA 125 gene: an extracellular superstructure dominated by repeat sequences. Tumour Biol. 22, 348–366 (2001).
Bendtsen, J.D., Nielsen, H., vonHeijne, G. & Brunak, S. Improved predication of signal peptides: SignalP 3.0. J. Mol. Biol. 340, 783–795 (2004).
Miyakis, S., Giannakopoulos, B. & Krilis, S.A. Beta 2 glycoprotein I–function in health and disease. Thromb. Res. 114, 335–346 (2004).
Tang, H.Y. et al. A novel four-dimensional strategy combining protein and peptide separation methods enables detection of low-abundance proteins in human plasma and serum proteomes. Proteomics 5, 3329–3342 (2005).
Wang, H. et al. Intact-protein based high-resolution three-dimensional quantitative analysis system for proteome profiling of biological fluids. Mol. Cell. Proteomics 4, 618–625 (2005).
Misek, D.E. et al. A wide range of protein isoforms in serum and plasma uncovered by a quantitative Intact Protein Analysis System (IPAS). Proteomics 5, 3343–3351 (2005).
Choudhary, J.S., Blackstock, W.P., Creasy, D.M. & Cottrell, J.S. Interrogating the human genome using uninterpreted mass spectrometry data. Proteomics 1, 651–667 (2001).
Kuster, B., Mortensen, P., Andersen, J.S. & Mann, M. Mass spectrometry allows direct identification of proteins in large genomes. Proteomics 1, 641–650 (2001).
Kreahling, J. & Graveley, B.R. The origins and implications of Alternative splicing. Trends Genet. 20, 1–4 (2004).
Link, A.J. et al. Direct analysis of protein complexes using mass spectrometry. Nat. Biotechnol. 17, 676–682 (1999).
Liu, H., Sadygov, R.G. & Yates, J.R. A model for random sampling and estimation of relative protein abundance in shotgun proteomics. Anal. Chem. 76, 4193–4201 (2004).
Washburn, M.P., Wolters, D. & Yates, J.R. Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat. Biotechnol. 19, 242–247 (2001).
Ghaemmaghami, S. et al. Global analysis of protein expression in yeast. Nature 425, 737–741 (2003).
Anderson, N.L. et al. The human plasma proteome: a nonredundant list developed by combination of four separate sources. Mol. Cell. Proteomics 3, 311–316 (2004).
Chan, K.C. et al. Analysis of the human serum proteome. Clin. Proteomics 1, 101–225 (2004).
Zhou, M. et al. An investigation in the human serum “interactome”. Electrophoresis 25, 1289–1298 (2004).
Jaffe, J.D., Berg, H.C. & Church, G.M. Proteogenomic mapping as a complementary method to perform genome annotation. Proteomics 4, 59–77 (2004).
Oyama, M. et al. Analysis of small human proteins reveals the translation of upstream open reading frames of mRNAs. Genome Res. 14, 2048–2052 (2004).
The collaborative HUPO Plasma Protein study and the data analysis presented here have been supported by a trans-National Institutes of Health grant supplement 84982 administered by the National Cancer Institute, by pharmaceutical and technology company sponsors and by voluntary efforts of collaborating laboratories.
The authors declare no competing financial interests.
Accrual of identifications as a function of sampling. (PDF 20 kb)
Complement component 3 isoforms. (PDF 20 kb)
Numbers of protein identificaitons by specifmen and by methodologies applied in individual laboratories. (PDF 90 kb)
List of high-confidence protein identifications. (PDF 116 kb)
Intragenic peptides not in an annotated exon. (PDF 15 kb)
About this article
Cite this article
States, D., Omenn, G., Blackwell, T. et al. Challenges in deriving high-confidence protein identifications from data gathered by a HUPO plasma proteome collaborative study. Nat Biotechnol 24, 333–338 (2006). https://doi.org/10.1038/nbt1183
Fertility and Sterility (2020)
Innovative methods for biomarker discovery in the evaluation and development of cancer precision therapies
Cancer and Metastasis Reviews (2018)
Proteomic approach to profiling immune complex antigens in cerebrospinal fluid samples from patients with central nervous system autoimmune diseases
Clinica Chimica Acta (2018)
BMC Genomics (2018)