Letter | Published:

A probability-based approach for high-throughput protein phosphorylation analysis and site localization


Data analysis and interpretation remain major logistical challenges when attempting to identify large numbers of protein phosphorylation sites by nanoscale reverse-phase liquid chromatography/tandem mass spectrometry (LC-MS/MS) (Supplementary Figure 1 online). In this report we address challenges that are often only addressable by laborious manual validation, including data set error, data set sensitivity and phosphorylation site localization. We provide a large-scale phosphorylation data set with a measured error rate as determined by the target-decoy approach, we demonstrate an approach to maximize data set sensitivity by efficiently distracting incorrect peptide spectral matches (PSMs), and we present a probability-based score, the Ascore, that measures the probability of correct phosphorylation site localization based on the presence and intensity of site-determining ions in MS/MS spectra. We applied our methods in a fully automated fashion to nocodazole-arrested HeLa cell lysate where we identified 1,761 nonredundant phosphorylation sites from 491 proteins with a peptide false-positive rate of 1.3%.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.


  1. 1

    Kim, J.E., Tannenbaum, S.R. & White, F.M. Global phosphoproteome of HT-29 human colon adenocarcinoma cells. J. Proteome Res. 4, 1339–1346 (2005).

  2. 2

    Cantin, G.T., Venable, J.D., Cociorva, D. & Yates, J.R., III . Quantitative phosphoproteomic analysis of the tumor necrosis factor pathway. J. Proteome Res. 5, 127–134 (2006).

  3. 3

    Ballif, B.A., Villen, J., Beausoleil, S.A., Schwartz, D. & Gygi, S.P. Phosphoproteomic analysis of the developing mouse brain. Mol. Cell. Proteomics 3, 1093–1101 (2004).

  4. 4

    Beausoleil, S.A. et al. Large-scale characterization of HeLa cell nuclear phosphoproteins. Proc. Natl. Acad. Sci. USA 101, 12130–12135 (2004).

  5. 5

    Ficarro, S.B. et al. Phosphoproteome analysis by mass spectrometry and its application to Saccharomyces cerevisiae. Nat. Biotechnol. 20, 301–305 (2002).

  6. 6

    Gruhler, A. et al. Quantitative phosphoproteomics applied to the yeast pheromone signaling pathway. Mol. Cell. Proteomics 4, 310–327 (2005).

  7. 7

    Nuhse, T.S., Stensballe, A., Jensen, O.N. & Peck, S.C. Large-scale analysis of in vivo phosphorylated membrane proteins by immobilized metal ion affinity chromatography and mass spectrometry. Mol. Cell. Proteomics 2, 1234–1243 (2003).

  8. 8

    Rush, J. et al. Immunoaffinity profiling of tyrosine phosphorylation in cancer cells. Nat. Biotechnol. 23, 94–101 (2005).

  9. 9

    Collins, M.O. et al. Proteomic analysis of in vivo phosphorylated synaptic proteins. J. Biol. Chem. 280, 5972–5982 (2005).

  10. 10

    Trinidad, J.C., Specht, C.G., Thalhammer, A., Schoepfer, R. & Burlingame, A.L. Comprehensive identification of phosphorylation sites in postsynaptic density preparations. Mol. Cell Proteomics 5, 914–922 (2006).

  11. 11

    MacCoss, M.J. Computational analysis of shotgun proteomics data. Curr. Opin. Chem. Biol. 9, 88–94 (2005).

  12. 12

    Peng, J., Elias, J.E., Thoreen, C.C., Licklider, L.J. & Gygi, S.P. Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome. J. Proteome Res. 2, 43–50 (2003).

  13. 13

    Elias, J.E., Gibbons, F.D., King, O.D., Roth, F.P. & Gygi, S.P. Intensity-based protein identification by machine learning from a library of tandem mass spectra. Nat. Biotechnol. 22, 214–219 (2004).

  14. 14

    DeGnore, J.P. & Qin, J. Fragmentation of phosphopeptides in an ion trap mass spectrometer. J. Am. Soc. Mass Spectrom. 9, 1175–1188 (1998).

  15. 15

    Schwartz, D. & Gygi, S.P. An iterative statistical approach to the identification of protein phosphorylation motifs from large-scale data sets. Nat. Biotechnol. 23, 1391–1398 (2005).

  16. 16

    Elias, J.E., Haas, W., Faherty, B.K. & Gygi, S.P. Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations. Nat. Methods 2, 667–675 (2005).

  17. 17

    Pawson, T. & Scott, J.D. Protein phosphorylation in signaling—50 years and counting. Trends Biochem. Sci. 30, 286–290 (2005).

  18. 18

    Ballif, B.A. et al. Quantitative phosphorylation profiling of the ERK/p90 ribosomal S6 kinase-signaling cassette and its targets, the tuberous sclerosis tumor suppressors. Proc. Natl. Acad. Sci. USA 102, 667–672 (2005).

  19. 19

    Stemmann, O., Zou, H., Gerber, S.A., Gygi, S.P. & Kirschner, M.W. Dual inhibition of sister chromatid separation at metaphase. Cell 107, 715–726 (2001).

  20. 20

    Syka, J.E. et al. Novel linear quadrupole ion trap/FT mass spectrometer: performance characterization and use in the comparative analysis of histone H3 post-translational modifications. J. Proteome Res. 3, 621–626 (2004).

  21. 21

    Haas, W. et al. Optimization and use of peptide mass measurement accuracy in shotgun proteomics. Mol. Cell. Proteomics (in the press) (2006).

Download references


We thank David Chiang, James Candlin and the software developers at Sage-N-Research for early access to Sequest-Sorcerer and on-the-fly peptide reversal within the Sequest algorithm. We thank P. Everley, C. Bakalarski and B. Faherty for in-house software development and D. Schwartz for motif analysis. HeLa cell lysate was generously provided by M. Rape. This work was supported in part by grants from the National Institutes of Health (HG03456 and GM67945).

Author information

S.A.B. conducted all experiments, carried out algorithm development and implementation, and data analysis. J.V. and S.A.G. provided analytical expertise for SCX chromatography and mass spectrometry. J.R. provided synthetic peptide libraries and immunoprecipitation data. S.P.G. provided overall experimental design and support.

Correspondence to Steven P Gygi.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Fig. 1

General Strategy for the large-scale analysis of protein phosphorylation. (PDF 757 kb)

Supplementary Fig. 2

Determining an effective search space using the target/decoy strategy. (PDF 863 kb)

Supplementary Fig. 3

Residue composition from data sets of known phosphorylation sites. (PDF 1768 kb)

Supplementary Fig. 4

Comparison of the Ascore versus Sequest scoring criteria for site localization. (PDF 3070 kb)

Supplementary Fig. 5

Comparison of the Ascore versus Mascot scoring criteria for site localization. (PDF 3256 kb)

Supplementary Fig. 6

Sequence logos (http://weblogo.berkeley.edu/) of the motifs identified in this large-scale data set shown in Supplementary table 1 with an Ascore > 15. (PDF 978 kb)

Supplementary Fig. 7

Biological processes of 362 of the 491 proteins identified in this experiment. (PDF 622 kb)

Supplementary Fig. 8

SDS-PAGE gel used in this experiment. (PDF 5115 kb)

Supplementary Table 1

Filtering criteria for the entire experiment. (PDF 1053 kb)

Supplementary Table 2

2,836 identified phosphopeptides from the entire experiment. (XLS 1649 kb)

Supplementary Table 3

Synthetic peptide libraries. (XLS 815 kb)

Supplementary Table 4

Immunoprecipitation experiments. (XLS 495 kb)

Rights and permissions

Reprints and Permissions

About this article

Further reading

Figure 1: Composite target/decoy database searching strategy provides an accurate estimate of false-positive rates for large data sets by knowingly distracting fifty percent of the error.
Figure 2: Establishing a low false-positive rate for large-scale phosphorylation data sets.
Figure 3: Resolving ambiguity in phosphorylation site localization.
Figure 4: Sequest and Mascot can fail to provide proper phosphorylation site placement.