An iterative statistical approach to the identification of protein phosphorylation motifs from large-scale data sets

Schwartz, Daniel; Gygi, Steven P

doi:10.1038/nbt1146

Analysis
Published: 04 November 2005

An iterative statistical approach to the identification of protein phosphorylation motifs from large-scale data sets

Daniel Schwartz¹ &
Steven P Gygi¹

Nature Biotechnology volume 23, pages 1391–1398 (2005)Cite this article

5850 Accesses
719 Citations
9 Altmetric
Metrics details

Abstract

With the recent exponential increase in protein phosphorylation sites identified by mass spectrometry, a unique opportunity has arisen to understand the motifs surrounding such sites. Here we present an algorithm designed to extract motifs from large data sets of naturally occurring phosphorylation sites. The methodology relies on the intrinsic alignment of phospho-residues and the extraction of motifs through iterative comparison to a dynamic statistical background. Results show the identification of dozens of novel and known phosphorylation motifs from recently published serine, threonine and tyrosine phosphorylation studies. When applied to a linguistic data set to test the versatility of the approach, the algorithm successfully extracted hundreds of language motifs. This method, in addition to shedding light on the consensus sequences of identified and as yet unidentified kinases and modular protein domains, may also eventually be used as a tool to determine potential phosphorylation sites in proteins of interest.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Overview of motif-building strategy.**

**Figure 2: Sequence logo representations of various extracted motifs.**

A multi-purpose, regenerable, proteome-scale, human phosphoserine resource for phosphoproteomics

Article 24 October 2022

dbPSP 2.0, an updated database of protein phosphorylation sites in prokaryotes

Article Open access 29 May 2020

A high-speed search engine pLink 2 with systematic evaluation for proteome-scale identification of cross-linked peptides

Article Open access 30 July 2019

References

Schlessinger, J. & Lemmon, M.A. SH2 and PTB domains in tyrosine kinase signaling. Sci. STKE 2003, RE12 (2003).
PubMed Google Scholar
Ang, X.L. & Wade Harper, J. SCF-mediated protein degradation and cell cycle control. Oncogene 24, 2860–2870 (2005).
Article CAS Google Scholar
Pawson, T. & Scott, J.D. Protein phosphorylation in signaling—50 years and counting. Trends Biochem. Sci. 30, 286–290 (2005).
Article CAS Google Scholar
Obenauer, J.C., Cantley, L.C. & Yaffe, M.B. Scansite 2.0: Proteome-wide prediction of cell signaling interactions using short sequence motifs. Nucleic Acids Res. 31, 3635–3641 (2003).
Article CAS Google Scholar
Manning, B.D. & Cantley, L.C. Hitting the target: emerging technologies in the search for kinase substrates. Sci. STKE 2002, PE49 (2002).
PubMed Google Scholar
Rush, J. et al. Immunoaffinity profiling of tyrosine phosphorylation in cancer cells. Nat. Biotechnol. 23, 94–101 (2005).
Article CAS Google Scholar
Ficarro, S.B. et al. Phosphoproteome analysis by mass spectrometry and its application to Saccharomyces cerevisiae. Nat. Biotechnol. 20, 301–305 (2002).
Article CAS Google Scholar
Beausoleil, S.A. et al. Large-scale characterization of HeLa cell nuclear phosphoproteins. Proc. Natl. Acad. Sci. USA 101, 12130–12135 (2004).
Article CAS Google Scholar
Collins, M.O. et al. Proteomic analysis of in vivo phosphorylated synaptic proteins. J. Biol. Chem. 280, 5972–5982 (2005).
Article CAS Google Scholar
Ballif, B.A., Villen, J., Beausoleil, S.A., Schwartz, D. & Gygi, S.P. Phosphoproteomic analysis of the developing mouse brain. Mol. Cell. Proteomics 3, 1093–1101 (2004).
Article CAS Google Scholar
Gruhler, A. et al. Quantitative phosphoproteomics applied to the yeast pheromone signaling pathway. Mol. Cell. Proteomics 4, 310–327 (2005).
Article CAS Google Scholar
Nuhse, T.S., Stensballe, A., Jensen, O.N. & Peck, S.C. Phosphoproteomics of the Arabidopsis plasma membrane and a new phosphorylation site database. Plant Cell 16, 2394–2405 (2004).
Article Google Scholar
Loyet, K.M., Stults, J.T. & Arnott, D. Mass spectrometric contributions to the practice of phosphorylation site mapping through 2003: a literature review. Mol. Cell. Proteomics 4, 235–245 (2005).
Article CAS Google Scholar
Bussemaker, H.J., Li, H. & Siggia, E.D. Building a dictionary for genomes: identification of presumptive regulatory sites by statistical analysis. Proc. Natl. Acad. Sci. USA 97, 10096–10100 (2000).
Article CAS Google Scholar
Melville, H. Moby-Dick, or, The whale (Signet Classic, New York, 1998).
Google Scholar
Diella, F. et al. Phospho.ELM: a database of experimentally verified phosphorylation sites in eukaryotic proteins. BMC Bioinformatics 5, 79 (2004).
Article Google Scholar
Rigoutsos, I. & Floratos, A. Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm. Bioinformatics 14, 55–67 (1998).
Article CAS Google Scholar
Jonassen, I., Collins, J.F. & Higgins, D.G. Finding flexible patterns in unaligned protein sequences. Protein Sci. 4, 1587–1595 (1995).
Article CAS Google Scholar
Thompson, W., Rouchka, E.C. & Lawrence, C.E. Gibbs Recursive Sampler: finding transcription factor binding sites. Nucleic Acids Res. 31, 3580–3585 (2003).
Article CAS Google Scholar
Nevill-Manning, C.G., Wu, T.D. & Brutlag, D.L. Highly specific protein sequence motifs for genome analysis. Proc. Natl. Acad. Sci. USA 95, 5865–5871 (1998).
Article CAS Google Scholar
Schneider, T.D. & Stephens, R.M. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 18, 6097–6100 (1990).
Article CAS Google Scholar
Crooks, G.E., Hon, G., Chandonia, J.M. & Brenner, S.E. WebLogo: a sequence logo generator. Genome Res. 14, 1188–1190 (2004).
Article CAS Google Scholar
Boucher, L., Ouzounis, C.A., Enright, A.J. & Blencowe, B.J. A genome-wide survey of RS domain proteins. RNA 7, 1693–1701 (2001).
CAS PubMed PubMed Central Google Scholar
Fujimoto, J. et al. Characterization of the transforming activity of p80, a hyperphosphorylated protein in a Ki-1 lymphoma cell line with chromosomal translocation t(2;5). Proc. Natl. Acad. Sci. USA 93, 4181–4186 (1996).
Article CAS Google Scholar
Iuchi, S. Three classes of C2H2 zinc finger proteins. Cell. Mol. Life Sci. 58, 625–635 (2001).
Article CAS Google Scholar
Songyang, Z. & Cantley, L.C. Recognition and specificity in protein tyrosine kinase-mediated signalling. Trends Biochem. Sci. 20, 470–475 (1995).
Article CAS Google Scholar
Branch, D.R. & Mills, G.B. pp60c-src expression is induced by activation of normal human T lymphocytes. J. Immunol. 154, 3678–3685 (1995).
CAS PubMed Google Scholar
Shin, N.Y. et al. Subsets of the major tyrosine phosphorylation sites in Crk-associated substrate (CAS) are sufficient to promote cell migration. J. Biol. Chem. 279, 38331–38337 (2004).
Article CAS Google Scholar
Yates, J.R. III, Eng, J.K. & McCormack, A.L. Mining genomes: correlating tandem mass spectra of modified and unmodified peptides to sequences in nucleotide databases. Anal. Chem. 67, 3202–3210 (1995).
Article CAS Google Scholar

Download references

Acknowledgements

The authors thank John Rush and Cell Signaling Technology for providing access to the tyrosine phosphorylation data sets prior to their publication. Additionally, D.S. wishes to thank Michael Chou for assistance with the Moby Dick analysis as well as numerous stimulating conversations regarding the algorithm and critical reading of the manuscript. This work was supported in part by National Institutes of Health grant HG03456 (S.P.G.).

Author information

Authors and Affiliations

Department of Cell Biology, 240 Longwood Ave., Harvard Medical School, Boston, 02115, Massachusetts, USA
Daniel Schwartz & Steven P Gygi

Authors

Daniel Schwartz
View author publications
You can also search for this author in PubMed Google Scholar
Steven P Gygi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Schwartz.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Table 1 (XLS 11 kb)

Supplementary Table 2 (XLS 9 kb)

Supplementary Table 3 (XLS 405 kb)

Supplementary Table 4 (XLS 167 kb)

Supplementary Table 5 (XLS 145 kb)

Supplementary Table 6 (XLS 36 kb)

Supplementary Table 7 (XLS 11 kb)

Supplementary Table 8 (XLS 11 kb)

Supplementary Table 9 (XLS 11 kb)

Supplementary Table 10 (XLS 11 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schwartz, D., Gygi, S. An iterative statistical approach to the identification of protein phosphorylation motifs from large-scale data sets. Nat Biotechnol 23, 1391–1398 (2005). https://doi.org/10.1038/nbt1146

Download citation

Published: 04 November 2005
Issue Date: 01 November 2005
DOI: https://doi.org/10.1038/nbt1146

This article is cited by

Broad phosphorylation mediated by testis-specific serine/threonine kinases contributes to spermiogenesis and male fertility
- Xuedi Zhang
- Ju Peng
- Guanjun Gao
Nature Communications (2023)
Integrative proteogenomic characterization of early esophageal cancer
- Lingling Li
- Dongxian Jiang
- Chen Ding
Nature Communications (2023)
Proteogenomic insights into the biology and treatment of pancreatic ductal adenocarcinoma
- Yexin Tong
- Mingjun Sun
- Chen Ding
Journal of Hematology & Oncology (2022)
Linking post-translational modifications and protein turnover by site-resolved protein turnover profiling
- Jana Zecha
- Wassim Gabriel
- Bernhard Kuster
Nature Communications (2022)
Integrated multi-omics analysis of adverse cardiac remodeling and metabolic inflexibility upon ErbB2 and ERRα deficiency
- Catherine R. Dufour
- Hui Xia
- Vincent Giguère
Communications Biology (2022)

An iterative statistical approach to the identification of protein phosphorylation motifs from large-scale data sets

Abstract

Access options

Similar content being viewed by others

A multi-purpose, regenerable, proteome-scale, human phosphoserine resource for phosphoproteomics

dbPSP 2.0, an updated database of protein phosphorylation sites in prokaryotes

A high-speed search engine pLink 2 with systematic evaluation for proteome-scale identification of cross-linked peptides

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Supplementary information

Supplementary Table 1 (XLS 11 kb)

Supplementary Table 2 (XLS 9 kb)

Supplementary Table 3 (XLS 405 kb)

Supplementary Table 4 (XLS 167 kb)

Supplementary Table 5 (XLS 145 kb)

Supplementary Table 6 (XLS 36 kb)

Supplementary Table 7 (XLS 11 kb)

Supplementary Table 8 (XLS 11 kb)

Supplementary Table 9 (XLS 11 kb)

Supplementary Table 10 (XLS 11 kb)

Rights and permissions

About this article

Cite this article

This article is cited by

Broad phosphorylation mediated by testis-specific serine/threonine kinases contributes to spermiogenesis and male fertility

Integrative proteogenomic characterization of early esophageal cancer

Proteogenomic insights into the biology and treatment of pancreatic ductal adenocarcinoma

Linking post-translational modifications and protein turnover by site-resolved protein turnover profiling

Integrated multi-omics analysis of adverse cardiac remodeling and metabolic inflexibility upon ErbB2 and ERRα deficiency

Search

Quick links

Abstract

Access options

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links