Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Prediction of high-responding peptides for targeted protein assays by mass spectrometry

Abstract

Protein biomarker discovery produces lengthy lists of candidates that must subsequently be verified in blood or other accessible biofluids. Use of targeted mass spectrometry (MS) to verify disease- or therapy-related changes in protein levels requires the selection of peptides that are quantifiable surrogates for proteins of interest. Peptides that produce the highest ion-current response (high-responding peptides) are likely to provide the best detection sensitivity. Identification of the most effective signature peptides, particularly in the absence of experimental data, remains a major resource constraint in developing targeted MS–based assays. Here we describe a computational method that uses protein physicochemical properties to select high-responding peptides and demonstrate its utility in identifying signature peptides in plasma, a complex proteome with a wide range of protein concentrations. Our method, which employs a Random Forest classifier, facilitates the development of targeted MS–based assays for biomarker verification or any application where protein levels need to be measured.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: ESP application and model development overview.
Figure 2: ESP predictor validation and method comparison.
Figure 3: ESP predictions translate into experimentally validated MRM peptides.
Figure 4: Analysis of important physicochemical properties in predicting high-responding peptides.

Similar content being viewed by others

References

  1. Rifai, N., Gillette, M.A. & Carr, S.A. Protein biomarker discovery and validation: the long and uncertain path to clinical utility. Nat. Biotechnol. 24, 971–983 (2006).

    Article  CAS  Google Scholar 

  2. Uhlen, M. & Hober, S. Generation and validation of affinity reagents on a proteome-wide level. J. Mol. Recognit. (2008).

  3. Anderson, L. & Hunter, C.L. Quantitative mass spectrometric multiple reaction monitoring assays for major plasma proteins. Mol. Cell. Proteomics 5, 573–588 (2006).

    Article  CAS  Google Scholar 

  4. Gerber, S.A., Rush, J., Stemman, O., Kirschner, M.W. & Gygi, S.P. Absolute quantification of proteins and phosphoproteins from cell lysates by tandem MS. Proc. Natl. Acad. Sci. USA 100, 6940–6945 (2003).

    Article  CAS  Google Scholar 

  5. Keshishian, H., Addona, T., Burgess, M., Kuhn, E. & Carr, S.A. Quantitative, multiplexed assays for low abundance proteins in plasma by targeted mass spectrometry and stable isotope dilution. Mol. Cell. Proteomics 6, 2212–2229 (2007).

    Article  CAS  Google Scholar 

  6. Stahl-Zeng, J. et al. High sensitivity detection of plasma proteins by multiple reaction monitoring of N-glycosites. Mol. Cell. Proteomics 6, 1809–1817 (2007).

    Article  CAS  Google Scholar 

  7. Kuster, B., Schirle, M., Mallick, P. & Aebersold, R. Scoring proteomes with proteotypic peptide probes. Nat. Rev. Mol. Cell Biol. 6, 577–583 (2005).

    Article  CAS  Google Scholar 

  8. Craig, R., Cortens, J.P. & Beavis, R.C. Open source system for analyzing, validating, and storing protein identification data. J. Proteome Res. 3, 1234–1242 (2004).

    Article  CAS  Google Scholar 

  9. Deutsch, E.W., Lam, H. & Aebersold, R. PeptideAtlas: a resource for target selection for emerging targeted proteomics workflows. EBMO reports 9, 429–434 (2008).

    Article  CAS  Google Scholar 

  10. Mallick, P. et al. Computational prediction of proteotypic peptides for quantitative proteomics. Nat. Biotechnol. 25, 125–131 (2007).

    Article  CAS  Google Scholar 

  11. Sanders, W.S., Bridges, S.M., McCarthy, F.M., Nanduri, B. & Burgess, S.C. Prediction of peptides observable by mass spectrometry applied at the experimental set level. BMC Bioinformatics 8 Suppl 7, S23 (2007).

    Article  Google Scholar 

  12. Tang, H. et al. A computational approach toward label-free protein quantification using predicted peptide detectability. Bioinformatics 22, e481–e488 (2006).

    Article  CAS  Google Scholar 

  13. Webb-Robertson, B.J. et al. A support vector machine model for the prediction of proteotypic peptides for accurate mass and time proteomics. Bioinformatics 24, 1503–1509 (2008).

    Article  CAS  Google Scholar 

  14. Jaffe, J.D. et al. Accurate inclusion mass screening: a bridge from unbiased discovery to targeted assay development for biomarker verification. Mol. Cell. Proteomics 7, 1952–1962 (2008).

    Article  CAS  Google Scholar 

  15. Malmstrom, J., Lee, H. & Aebersold, R. Advances in proteomic workflows for systems biology. Curr. Opin. Biotechnol. 18, 378–384 (2007).

    Article  Google Scholar 

  16. Kawashima, S. & Kanehisa, M. AAindex: amino acid index database. Nucleic Acids Res. 28, 374 (2000).

    Article  CAS  Google Scholar 

  17. Zhang, Z. Prediction of low-energy collision-induced dissociation spectra of peptides. Anal. Chem. 76, 3908–3922 (2004).

    Article  CAS  Google Scholar 

  18. Breiman, L. Random forest. Mach. Learn. 45, 5–32 (2001).

    Article  Google Scholar 

  19. Liaw, A. & Wiener, M. ClassificatIon and Regression by randomForest. R News 2, 18–22 (2002).

    Google Scholar 

  20. Diaz-Uriarte, R. & Alvarez de Andres, S. Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7, 3 (2006).

    Article  Google Scholar 

  21. Enot, D.P., Beckmann, M., Overy, D. & Draper, J. Predicting interpretability of metabolome models based on behavior, putative identity, and biological relevance of explanatory signals. Proc. Natl. Acad. Sci. USA 103, 14865–14870 (2006).

    Article  CAS  Google Scholar 

  22. Vapnik, V. The Nature of Statistical Learning Theory (Springer, New York, 1995).

    Book  Google Scholar 

  23. Bishop, C. Neural Networks for Pattern Recognition (Oxford University Press, Oxford, 1995).

    Google Scholar 

  24. Fawcett, T. ROC Graphs: Notes and Practical Considerations for Researchers (Technical report, HP Laboratories, Palo Alto, CA, USA, 2004).

    Google Scholar 

  25. Lange, V., Picotti, P., Domon, B. & Aebersold, R. Selected reaction monitoring for quantitative proteomics: a tutorial. Mol. Syst. Biol. 4, 222 (2008).

    Article  Google Scholar 

  26. Svetnik, V. et al. Random forest: a classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 43, 1947–1958 (2003).

    Article  CAS  Google Scholar 

  27. Cech, N.B. & Enke, C.G. Relating electrospray ionization response to nonpolar character of small peptides. Anal. Chem. 72, 2717–2723 (2000).

    Article  CAS  Google Scholar 

  28. Cech, N.B. & Enke, C.G. Practical implications of some recent studies in electrospray ionization fundamentals. Mass Spectrom. Rev. 20, 362–387 (2001).

    Article  CAS  Google Scholar 

  29. Cowan, R. & Whittaker, R.G. Hydrophobicity indices for amino acid residues as determined by high-performance liquid chromatography. Pept. Res. 3, 75–80 (1990).

    CAS  PubMed  Google Scholar 

  30. Parker, J.M., Guo, D. & Hodges, R.S. New hydrophilicity scale derived from high-performance liquid chromatography peptide retention data: correlation of predicted surface residues with antigenicity and X-ray-derived accessible sites. Biochemistry 25, 5425–5432 (1986).

    Article  CAS  Google Scholar 

  31. Whiteaker, J.R. et al. Integrated pipeline for mass spectrometry-based discovery and confirmation of biomarkers demonstrated in a mouse model of breast cancer. J. Proteome Res. 6, 3962–3975 (2007).

    Article  CAS  Google Scholar 

  32. Zolg, J.W. & Langen, H. How industry is approaching the search for new diagnostic markers and biomarkers. Mol. Cell. Proteomics 3, 345–354 (2004).

    Article  CAS  Google Scholar 

  33. Sokal, R.R. & Rohlf, F.J. Biometry the Principles and Practice of Statistics in Biological Research, edn. 3 (W.H. Freeman and Company, 1995).

    Google Scholar 

  34. Thomson, R., Hodgman, T.C., Yang, Z.R. & Doyle, A.K. Characterizing proteolytic cleavage site activity using bio-basis function neural networks. Bioinformatics 19, 1741–1747 (2003).

    Article  CAS  Google Scholar 

  35. Yen, C.Y. et al. Improving sensitivity in shotgun proteomics using a peptide-centric database with reduced complexity: protease cleavage and SCX elution rules from data mining of MS/MS spectra. Anal. Chem. 78, 1071–1084 (2006).

    Article  CAS  Google Scholar 

  36. Chen, C., Liaw, A & Breiman, L. Using Random Forest to Learn Imbalanced Data (Technical Report 666. Statistics Department of University of California at Berkeley, Berkeley, 2004).

    Google Scholar 

  37. Sing, T., Sander, O., Beerenwinkel, N. & Lengauer, T. ROCR: visualizing classifier performance in R. Bioinformatics 21, 3940–3941 (2005).

    Article  CAS  Google Scholar 

  38. Klimek, J. et al. The standard protein mix database: a diverse data set to assist in the production of improved peptide and protein identification software tools. J. Proteome Res. 7, 96–103 (2008).

    Article  CAS  Google Scholar 

  39. Wang, H. et al. Development and evaluation of a micro- and nanoscale proteomic sample preparation method. J. Proteome Res. 4, 2397–2403 (2005).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank the National Cancer Institute (NCI) Clinical Proteomic Technology Assessment in Cancer Program (NCI-CPTAC, http://proteomics.cancer.gov/programs/CPTAC/) for providing samples of yeast lysate and raw MS data generated by the CPTAC centers. We thank Rushdy Ahmad, Kathy Do, Amy Ham, Emily Rudomin, and Shao-En Ong for MS data generation, and Hasmik Keshishian and Terri Addona for generating the lists of validated MRM peptides. We also thank Shao-En Ong, Jacob Jaffe, Karl Clauser, Eric Kuhn, Pablo Tamayo, and Nick Patterson for helpful discussions. We would like to thank the reviewers for their insightful comments. This work was supported in part by grants to S.A.C. from the National Institutes of Health Grants 1U24 CA126476 as part of the NCI's Clinical Proteomic Technologies Assessment in Cancer Program, the National Heart, Lung, and Blood Institute, U01-HL081341 and The Women's Cancer Research Fund; to J.P.M. from the National Science Foundation and NIGMS the National Institutes of Health (NIGMS and NCI); to D.R.M. from the National Institutes of Health grant R01 CA126219, as part of NCI's Clinical Proteomic Technologies for Cancer Program.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jill P Mesirov or Steven A Carr.

Supplementary information

Supplementary Figures 1–7, Methods, Data

(PDF 483 kb)

Supplementary Table 1

Ranked list of 550 physicochemical properties (XLS 77 kb)

Supplementary Table 2

Validated MRM peptides (XLS 49 kb)

Supplementary Source Code (ZIP 84009 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fusaro, V., Mani, D., Mesirov, J. et al. Prediction of high-responding peptides for targeted protein assays by mass spectrometry. Nat Biotechnol 27, 190–198 (2009). https://doi.org/10.1038/nbt.1524

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nbt.1524

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing