Abstract
Protein biomarker discovery produces lengthy lists of candidates that must subsequently be verified in blood or other accessible biofluids. Use of targeted mass spectrometry (MS) to verify disease- or therapy-related changes in protein levels requires the selection of peptides that are quantifiable surrogates for proteins of interest. Peptides that produce the highest ion-current response (high-responding peptides) are likely to provide the best detection sensitivity. Identification of the most effective signature peptides, particularly in the absence of experimental data, remains a major resource constraint in developing targeted MS–based assays. Here we describe a computational method that uses protein physicochemical properties to select high-responding peptides and demonstrate its utility in identifying signature peptides in plasma, a complex proteome with a wide range of protein concentrations. Our method, which employs a Random Forest classifier, facilitates the development of targeted MS–based assays for biomarker verification or any application where protein levels need to be measured.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout




References
Rifai, N., Gillette, M.A. & Carr, S.A. Protein biomarker discovery and validation: the long and uncertain path to clinical utility. Nat. Biotechnol. 24, 971–983 (2006).
Uhlen, M. & Hober, S. Generation and validation of affinity reagents on a proteome-wide level. J. Mol. Recognit. (2008).
Anderson, L. & Hunter, C.L. Quantitative mass spectrometric multiple reaction monitoring assays for major plasma proteins. Mol. Cell. Proteomics 5, 573–588 (2006).
Gerber, S.A., Rush, J., Stemman, O., Kirschner, M.W. & Gygi, S.P. Absolute quantification of proteins and phosphoproteins from cell lysates by tandem MS. Proc. Natl. Acad. Sci. USA 100, 6940–6945 (2003).
Keshishian, H., Addona, T., Burgess, M., Kuhn, E. & Carr, S.A. Quantitative, multiplexed assays for low abundance proteins in plasma by targeted mass spectrometry and stable isotope dilution. Mol. Cell. Proteomics 6, 2212–2229 (2007).
Stahl-Zeng, J. et al. High sensitivity detection of plasma proteins by multiple reaction monitoring of N-glycosites. Mol. Cell. Proteomics 6, 1809–1817 (2007).
Kuster, B., Schirle, M., Mallick, P. & Aebersold, R. Scoring proteomes with proteotypic peptide probes. Nat. Rev. Mol. Cell Biol. 6, 577–583 (2005).
Craig, R., Cortens, J.P. & Beavis, R.C. Open source system for analyzing, validating, and storing protein identification data. J. Proteome Res. 3, 1234–1242 (2004).
Deutsch, E.W., Lam, H. & Aebersold, R. PeptideAtlas: a resource for target selection for emerging targeted proteomics workflows. EBMO reports 9, 429–434 (2008).
Mallick, P. et al. Computational prediction of proteotypic peptides for quantitative proteomics. Nat. Biotechnol. 25, 125–131 (2007).
Sanders, W.S., Bridges, S.M., McCarthy, F.M., Nanduri, B. & Burgess, S.C. Prediction of peptides observable by mass spectrometry applied at the experimental set level. BMC Bioinformatics 8 Suppl 7, S23 (2007).
Tang, H. et al. A computational approach toward label-free protein quantification using predicted peptide detectability. Bioinformatics 22, e481–e488 (2006).
Webb-Robertson, B.J. et al. A support vector machine model for the prediction of proteotypic peptides for accurate mass and time proteomics. Bioinformatics 24, 1503–1509 (2008).
Jaffe, J.D. et al. Accurate inclusion mass screening: a bridge from unbiased discovery to targeted assay development for biomarker verification. Mol. Cell. Proteomics 7, 1952–1962 (2008).
Malmstrom, J., Lee, H. & Aebersold, R. Advances in proteomic workflows for systems biology. Curr. Opin. Biotechnol. 18, 378–384 (2007).
Kawashima, S. & Kanehisa, M. AAindex: amino acid index database. Nucleic Acids Res. 28, 374 (2000).
Zhang, Z. Prediction of low-energy collision-induced dissociation spectra of peptides. Anal. Chem. 76, 3908–3922 (2004).
Breiman, L. Random forest. Mach. Learn. 45, 5–32 (2001).
Liaw, A. & Wiener, M. ClassificatIon and Regression by randomForest. R News 2, 18–22 (2002).
Diaz-Uriarte, R. & Alvarez de Andres, S. Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7, 3 (2006).
Enot, D.P., Beckmann, M., Overy, D. & Draper, J. Predicting interpretability of metabolome models based on behavior, putative identity, and biological relevance of explanatory signals. Proc. Natl. Acad. Sci. USA 103, 14865–14870 (2006).
Vapnik, V. The Nature of Statistical Learning Theory (Springer, New York, 1995).
Bishop, C. Neural Networks for Pattern Recognition (Oxford University Press, Oxford, 1995).
Fawcett, T. ROC Graphs: Notes and Practical Considerations for Researchers (Technical report, HP Laboratories, Palo Alto, CA, USA, 2004).
Lange, V., Picotti, P., Domon, B. & Aebersold, R. Selected reaction monitoring for quantitative proteomics: a tutorial. Mol. Syst. Biol. 4, 222 (2008).
Svetnik, V. et al. Random forest: a classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 43, 1947–1958 (2003).
Cech, N.B. & Enke, C.G. Relating electrospray ionization response to nonpolar character of small peptides. Anal. Chem. 72, 2717–2723 (2000).
Cech, N.B. & Enke, C.G. Practical implications of some recent studies in electrospray ionization fundamentals. Mass Spectrom. Rev. 20, 362–387 (2001).
Cowan, R. & Whittaker, R.G. Hydrophobicity indices for amino acid residues as determined by high-performance liquid chromatography. Pept. Res. 3, 75–80 (1990).
Parker, J.M., Guo, D. & Hodges, R.S. New hydrophilicity scale derived from high-performance liquid chromatography peptide retention data: correlation of predicted surface residues with antigenicity and X-ray-derived accessible sites. Biochemistry 25, 5425–5432 (1986).
Whiteaker, J.R. et al. Integrated pipeline for mass spectrometry-based discovery and confirmation of biomarkers demonstrated in a mouse model of breast cancer. J. Proteome Res. 6, 3962–3975 (2007).
Zolg, J.W. & Langen, H. How industry is approaching the search for new diagnostic markers and biomarkers. Mol. Cell. Proteomics 3, 345–354 (2004).
Sokal, R.R. & Rohlf, F.J. Biometry the Principles and Practice of Statistics in Biological Research, edn. 3 (W.H. Freeman and Company, 1995).
Thomson, R., Hodgman, T.C., Yang, Z.R. & Doyle, A.K. Characterizing proteolytic cleavage site activity using bio-basis function neural networks. Bioinformatics 19, 1741–1747 (2003).
Yen, C.Y. et al. Improving sensitivity in shotgun proteomics using a peptide-centric database with reduced complexity: protease cleavage and SCX elution rules from data mining of MS/MS spectra. Anal. Chem. 78, 1071–1084 (2006).
Chen, C., Liaw, A & Breiman, L. Using Random Forest to Learn Imbalanced Data (Technical Report 666. Statistics Department of University of California at Berkeley, Berkeley, 2004).
Sing, T., Sander, O., Beerenwinkel, N. & Lengauer, T. ROCR: visualizing classifier performance in R. Bioinformatics 21, 3940–3941 (2005).
Klimek, J. et al. The standard protein mix database: a diverse data set to assist in the production of improved peptide and protein identification software tools. J. Proteome Res. 7, 96–103 (2008).
Wang, H. et al. Development and evaluation of a micro- and nanoscale proteomic sample preparation method. J. Proteome Res. 4, 2397–2403 (2005).
Acknowledgements
We thank the National Cancer Institute (NCI) Clinical Proteomic Technology Assessment in Cancer Program (NCI-CPTAC, http://proteomics.cancer.gov/programs/CPTAC/) for providing samples of yeast lysate and raw MS data generated by the CPTAC centers. We thank Rushdy Ahmad, Kathy Do, Amy Ham, Emily Rudomin, and Shao-En Ong for MS data generation, and Hasmik Keshishian and Terri Addona for generating the lists of validated MRM peptides. We also thank Shao-En Ong, Jacob Jaffe, Karl Clauser, Eric Kuhn, Pablo Tamayo, and Nick Patterson for helpful discussions. We would like to thank the reviewers for their insightful comments. This work was supported in part by grants to S.A.C. from the National Institutes of Health Grants 1U24 CA126476 as part of the NCI's Clinical Proteomic Technologies Assessment in Cancer Program, the National Heart, Lung, and Blood Institute, U01-HL081341 and The Women's Cancer Research Fund; to J.P.M. from the National Science Foundation and NIGMS the National Institutes of Health (NIGMS and NCI); to D.R.M. from the National Institutes of Health grant R01 CA126219, as part of NCI's Clinical Proteomic Technologies for Cancer Program.
Author information
Authors and Affiliations
Corresponding authors
Supplementary information
Supplementary Figures 1–7, Methods, Data
(PDF 483 kb)
Supplementary Table 1
Ranked list of 550 physicochemical properties (XLS 77 kb)
Supplementary Table 2
Validated MRM peptides (XLS 49 kb)
Rights and permissions
About this article
Cite this article
Fusaro, V., Mani, D., Mesirov, J. et al. Prediction of high-responding peptides for targeted protein assays by mass spectrometry. Nat Biotechnol 27, 190–198 (2009). https://doi.org/10.1038/nbt.1524
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nbt.1524