Protein phosphorylation is a key post-translational modification regulating protein function in almost all cellular processes. Although tens of thousands of phosphorylation sites have been identified in human cells, approaches to determine the functional importance of each phosphosite are lacking. Here, we manually curated 112 datasets of phospho-enriched proteins, generated from 104 different human cell types or tissues. We re-analyzed the 6,801 proteomics experiments that passed our quality control criteria, creating a reference phosphoproteome containing 119,809 human phosphosites. To prioritize functional sites, we used machine learning to identify 59 features indicative of proteomic, structural, regulatory or evolutionary relevance and integrate them into a single functional score. Our approach identifies regulatory phosphosites across different molecular mechanisms, processes and diseases, and reveals genetic susceptibilities at a genomic scale. Several regulatory phosphosites were experimentally validated, including identifying a role in neuronal differentiation for phosphosites in SMARCC2, a member of the SWI/SNF chromatin-remodeling complex.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Nature Structural & Molecular Biology Open Access 23 January 2023
Nature Communications Open Access 13 January 2023
Nature Open Access 11 January 2023
Subscribe to Nature+
Get immediate online access to Nature and 55 other Nature journal
Subscribe to Journal
Get full journal access for 1 year
only $8.25 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
All MS data, including raw and MQ intermediate processing settings and results files, are available in PRIDE under the accession PXD012174. The functional annotation of the phosphoproteome, as well as the gold standard and resulting functional scores, is available in Supplementary Tables. The conditional regulation data used in feature generation are available at http://phosphate.com.
The code to proceed with the generation of some of the features (for example, age reconstruction or structural hotspots) is available in the respective repositories as described in Supplementary Notes 1. All features, the MS phosphoproteome and the gold standard, as well as the necessary code to train and apply the functional score model are available in the R package funscoR (https://github.com/evocellnet/funscoR).
Lahiry, P., Torkamani, A., Schork, N. J. & Hegele, R. A. Kinase mutations in human disease: interpreting genotype–phenotype relationships. Nat. Rev. Genet. 11, 60–74 (2010).
Torkamani, A., Kannan, N., Taylor, S. S. & Schork, N. J. Congenital disease SNPs target lineage specific structural elements in protein kinases. Proc. Natl Acad. Sci. USA 105, 9011–9016 (2008).
Aebersold, R. & Mann, M. Mass-spectrometric exploration of proteome structure and function. Nature 537, 347–355 (2016).
Sharma, K. et al. Ultradeep human phosphoproteome reveals a distinct regulatory nature of Tyr and Ser/Thr-based signaling. Cell Rep. 8, 1583–1594 (2014).
Hornbeck, P. V. et al. PhosphoSitePlus, 2014: mutations, PTMs and recalibrations. Nucleic Acids Res. 43, D512–D520 (2015).
Needham, E. J., Parker, B. L., Burykin, T., James, D. E. & Humphrey, S. J. Illuminating the dark phosphoproteome. Sci. Signal. 12, eaau8645 (2019).
Beltrao, P., Bork, P., Krogan, N. J. & Van Noort, V. Evolution and functional cross-talk of protein post-translational modifications. Mol. Syst. Biol. 9, 714 (2013).
Kanshin, E., Bergeron-Sandoval, L.-P., Isik, S. S., Thibault, P. & Michnick, S. W. A cell-signaling network temporally resolves specific versus promiscuous phosphorylation. Cell Rep. 10, 1202–1214 (2015).
Landry, C. R., Levy, E. D. & Michnick, S. W. Weak functional constraints on phosphoproteomes. Trends Genet. 25, 193–197 (2009).
Beltrao, P. et al. Systematic functional prioritization of protein posttranslational modifications. Cell 150, 413–425 (2012).
Strumillo, M. J. et al. Conserved phosphorylation hotspots in eukaryotic protein domain families. Nat. Commun. 10, 1977 (2019).
Studer, R. A. et al. Evolution of protein phosphorylation across 18 fungal species. Science 354, 229–232 (2016).
Betts, M. J. et al. Systematic identification of phosphorylation-mediated protein interaction switches. PLoS Comput. Biol. 13, e1005462 (2017).
Nishi, H., Hashimoto, K. & Panchenko, A. R. Phosphorylation in protein–protein binding: effect on stability and function. Structure 19, 1807–1815 (2011).
Šoštarić, N. et al. Effects of acetylation and phosphorylation on subunit interactions in three large eukaryotic complexes. Mol. Cell. Proteomics 17, 2387–2401 (2018).
Torres, M. P., Dewhurst, H. & Sundararaman, N. Proteome-wide structural analysis of PTM hotspots reveals regulatory elements predicted to impact biological function and disease. Mol. Cell. Proteomics 15, 3513–3528 (2016).
Raguz Nakic, Z., Seisenbacher, G., Posas, F. & Sauer, U. Untargeted metabolomics unravels functionalities of phosphorylation sites in Saccharomyces cerevisiae. BMC Syst. Biol. 10, 104 (2016).
Vizcaíno, J. A. et al. 2016 update of the PRIDE database and its related tools. Nucleic Acids Res. 44, D447–D456 (2016).
Cox, J. & Mann, M. MaxQuant enables high peptide identification rates, individualized ppb-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 26, 1367–1372 (2008).
Elias, J. E. & Gygi, S. P. Target–decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 4, 207–214 (2007).
Olsen, J. V. et al. Global, in vivo, and site-specific phosphorylation dynamics in signaling networks. Cell 127, 635–648 (2006).
Wang, M., Herrmann, C. J., Simonovic, M., Szklarczyk, D. & von Mering, C. Version 4.0 of PaxDb: protein abundance data, integrated across model organisms, tissues, and cell lines. Proteomics 15, 3163–3168 (2015).
Ochoa, D. et al. An atlas of human kinase regulation. Mol. Syst. Biol. 12, 888 (2016).
Vaser, R., Adusumalli, S., Leng, S. N., Sikic, M. & Ng, P. C. SIFT missense predictions for genomes. Nat. Protoc. 11, 1–9 (2016).
Michels, A. A. et al. mTORC1 directly phosphorylates and regulates human MAF1. Mol. Cell. Biol. 30, 3749–3757 (2010).
Friedman, J. H. Stochastic gradient boosting. Comput. Stat. Data Anal. 38, 367–378 (2002).
Houssa, B., De Widt, J., Kranenburg, O., Moolenaar, W. H. & Van Blitterswijk, W. J. Diacylglycerol kinase θ binds to and is negatively regulated by active RhoA. J. Biol. Chem. 274, 6820–6822 (1999).
Uezu, A. et al. Modified SH2 domain to phototrap and identify phosphotyrosine proteins from subcellular sites within cells. Proc. Natl Acad. Sci. USA. 109, E2929–E2938 (2012).
Worby, C. A. et al. The Fic domain: regulation of cell signaling by adenylylation. Mol. Cell 34, 93–103 (2009).
del Toro, N. et al. Capturing variation impact on molecular interactions in the IMEx consortium mutations data set. Nat. Commun. 10, 10 (2019).
Hwang, H. I., Ji, J. H. & Jang, Y. J. Phosphorylation of Ran-binding protein-1 by Polo-like kinase-1 is required for interaction with Ran and early mitotic progression. J. Biol. Chem. 286, 33012–33020 (2011).
Shibano, T., Mamada, H., Hakuno, F., Takahashi, S. I. & Taira, M. The inner nuclear membrane protein Nemp1 is a new type of RanGTP-binding protein in eukaryotes. PLoS ONE 10, e0127271 (2015).
Garcia-Alonso, L. et al. Transcription factor activities enhance markers of drug sensitivity in cancer. Cancer Res. 78, 769–780 (2018).
Mertins, P. et al. Proteogenomics connects somatic mutations to signalling in breast cancer. Nature 534, 55–62 (2016).
Koboldt, D. C. et al. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012).
Darnell, J. E., Wen, Z. & Zhong, Z. Maximal activation of transcription by STATl and STAT3 requires both tyrosine and serine phosphorylation. Cell 82, 241–250 (1995).
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
Landrum, M. J. et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 46, D1062–D1067 (2018).
Jaglin, X. H. & Chelly, J. Tubulin-related cortical dysgeneses: microtubule dysfunction underlying neuronal migration defects. Trends Genet. 25, 555–566 (2009).
Westmoreland, T. J. et al. Comparative genome-wide screening identifies a conserved doxorubicin repair network that is diploid specific in Saccharomyces cerevisiae. PLoS ONE 4, e5830 (2009).
Becher, I. et al. Pervasive protein thermal stability variation during the cell cycle. Cell 173, 1495–1507 (2018).
Mateus, A. et al. Thermal proteome profiling in bacteria: probing protein state in vivo. Mol. Syst. Biol. 14, e8242 (2018).
Savitski, M. M. et al. Tracking cancer drugs in living cells by thermal profiling of the proteome. Science 346, 1255784 (2014).
Tuoc, T. C. et al. Chromatin regulation by BAF170 controls cerebral cortical size and thickness. Dev. Cell 25, 256–269 (2013).
Devlin, B. et al. Patterns and rates of exonic de novo mutations in autism spectrum disorders. Nature 485, 242–246 (2012).
Staahl, B. T. & Crabtree, G. R. Creating a neural specific chromatin landscape by npBAF and nBAF complexes. Curr. Opin. Neurobiol. 23, 903–913 (2013).
Liu, J. J. et al. In vivo brain GPCR signaling elucidated by phosphoproteomics. Science 360, eaao4927 (2018).
Sene, K. et al. Gene function in early mouse embryonic stem cell differentiation. BMC Genomics 8, 85 (2007).
Cahoy, J. D. et al. A transcriptome database for astrocytes, neurons, and oligodendrocytes: a new resource for understanding brain development and function. J. Neurosci. 28, 264–278 (2008).
Krahmer, N. et al. Organellar proteomics and phospho-proteomics reveal subcellular reorganization in diet-induced hepatic steatosis. Dev. Cell 47, 205–221 (2018).
Rogerson, D. T. et al. Efficient genetic encoding of phosphoserine and its nonhydrolyzable analog. Nat. Chem. Biol. 11, 496–503 (2015).
Gray, V. E. & Kumar, S. Rampant purifying selection conserves positions with posttranslational modifications in human proteins. Mol. Biol. Evol. 28, 1565–1568 (2011).
Cox, J. et al. Andromeda: a peptide search engine integrated into the MaxQuant environment. J. Proteome Res. 10, 1794–1805 (2011).
Bateman, A. et al. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 45, D158–D169 (2017).
Jäger, S. et al. Global landscape of HIV–human protein complexes. Nature 481, 365–370 (2012).
Teo, G. et al. SAINTexpress: improvements and additional features in significance analysis of INTeractome software. J. Proteomics 100, 37–43 (2014).
Edgar, R. GeneExpression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30, 207–210 (2002).
Türei, D., Korcsmáros, T. & Saez-Rodriguez, J. OmniPath: guidelines and gateway for literature-curated signaling pathway resources. Nat. Methods 13, 966–967 (2016).
Alvarez, M. J. et al. Network-based inference of protein activity helps functionalize the genetic landscape of cancer. Nat. Genet. 48, 838–847 (2016).
Khmelinskii, A., Meurer, M., Duishoev, N., Delhomme, N. & Knop, M. Seamless gene tagging by endonuclease-driven homologous recombination. PLoS ONE 6, e23794 (2011).
Winzeler, E. A. et al. Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science 285, 901–906 (1999).
Ran, F. A. F. A. et al. XOne-step generation of mice carrying reporter and conditional alleles by CRISPR/Cas-mediated genome engineering. Cell 154, 1370–1379 (2013).
Bibel, M. et al. Differentiation of mouse embryonic stem cells into a defined neuronal lineage. Nat. Neurosci. 7, 1003–1009 (2004).
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360 (2015).
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Anders, S., Pyl, P. T. & Huber, W. HTSeq: a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015).
Zerbino, D. R. et al. Ensembl 2018. Nucleic Acids Res. 46, D754–D761 (2018).
Liao, Y., Smyth, G. K. & Shi, W. The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Res. 41, e108 (2013).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Liberzon, A. et al. The molecular signatures database Hallmark gene set collection. Cell Syst. 1, 417–425 (2015).
This study would have been impossible without the selfless deposition of data from hundreds of authors. We extend our gratitude to every one of them. We thank J. Cox for his insightful advice on the site decoy multiple-testing correction. We would like to thank the members of the Beltrao group for their support in collecting features and their relevant comments, as well as D. Ocaña and A. Cafferkey as part of the EMBL-EBI Technical Services Cluster. We thank D. Helm from the EMBL Proteomics Core Facility for help with the analysis of thermal proteome profiling samples. This study has been funded by EMBL core funding and the Wellcome Trust (grant numbers WT101477MA and 208391/Z/17/Z). P.B. and D.O. are supported by a Starting Grant Award from the European Research Council (ERC-2014-STG 638884 PhosFunc).
The authors declare no competing interests.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Integrated supplementary information
At least one identified phosphorylation was required to consider a protein phosphorylated. Protein abundance data obtained from the PaxDb consensus human proteome (Methods).
Supplementary Figure. 2 Functional score performance against effect predictions of phospho-deficient variants when predicting functional phosphosites or disease-related phosphosites.
Functional and disease-related phosphosites obtained from PSP database.
Supplementary Figure. 3 Examples of experimentally validated phosphosites not included in the training set.
Functional score and position for phosphosites identified in the PRIDE search and colored by the level of functional annotation in PhosphositePlus (PSP). Sites marked in red represent sites of unknown function in PSP that were supported by experimental evidence in the literature. For example, valosin-containing protein (VCP) pY805 is ranked as the highest scoring (0.65) phosphosite in the protein and it’s known to disrupt interaction with both PNGase and Ufd3 (PNAS 104:21, 8785-8790, 2007). Similarly, alanine or glutamate mutation of the best-scored S/T site pS6 (0.81) in SDCBP abolished interaction with ubiquitin, as demonstrated by His-Ub pulldown assays (Journal of Biological Chemistry, 286:45, 39606-39614, 2011). Phospho-mimicking mutations in the highly scored p62/SQSTM1 pS24 (0.72) restored polymerization instead (Biochimica et Biophysica Acta—Molecular Cell Research 1843:11, 2765-2774, 2014).
Supplementary Figure. 4 Separation of sites of known and unknown function based on their molecular regulatory role.
The molecular function was obtained from PSP. n = total number of identified phosphosites with unknown (top) or known (bottom) regulatory function. Vertical line represents the median value.
Supplementary Figure. 5 Identification and characterization of known regulatory sites determining protein binding specificity.
a) b-ions and y-ions for the 4 only RHOA phosphopeptides identified phosphorylated in Tyr34. The 4 peptides were identified from Primary AML tumors treated with the protein tyrosine phosphatase pervanadate. b) Number of MS/MS identifications in all samples containing modified or unmodified peptides in RHOA. c) Aligned structural data for binding of RHOA with 7 different partners (PDB files: 1tx4, 5hpy and 3msx—left—4d0n, 4xh9, 2rgn and1tx4—right). The position of the acceptor tyrosine—red—changes depending on the group of binding partners.
Supplementary Figure. 6 Functional score predicts the impact of mutations on protein-protein interactions.
The impact of mutating a phosphosite residue on protein interactions was compiled for a total of 394 human phosphosite positions. For each bin of functional scores, we calculated the fold ratio of observing an effect (gain or loss of interaction) over no effect.
Supplementary Figure. 7 Changes in thermal stability and protein abundance levels for GAPDH enzymes for the KO and phospho-mutant strains of THD3.
The protein abundance and protein thermal stability of the yeast proteome was measured using a Thermal Proteome Profiling (TPP) experiment, comparing the THD3 mutant strains (KO, S149A, S151A) with the WT strain. The same comparison was performed in the presence or absence of doxorubicin. Shown here are the fold changes in abundance and stability of the 3 GAPDH enzymes in yeast (TDH1, TDH2 and TDH3). * denotes abs(score)>2, FDR=0.05 and *** denotes abs(score)>3, FDR=0.01.
For KEGG pathways with more than 20 S. cerevisiae genes, we performed a gene set enrichment (GSEA) test for the fold changes of a given mutant relative to WT. * denotes pathways with significant enrichment after accounting for multiple testing (FDR=0.02).
Supplementary Figure. 9 Increased level of Smarcc2/Baf170 protein is detected at day-12 of neuronal differentiation independent of the genetic background.
Each lane indicates individual clone: homozygous (3 biological replicates), heterozygous (2 biological replicates) and control (2 biological replicates). Wild type (WT) are parental mESCs without CRISPR targeting.
Supplementary Figure. 10 Morphological differences in day-12 neuronal differentiation for Smarcc2 CRISPR control, heterozygous and homozygous S302A/S304A clones.
Every representative image corresponds to a different biological replicate. Bright field images, 20x.
Supplementary Figure 11 Accumulation of phosphosites as the number of phospho-enriched datasets deposited in PRIDE grows.
Rarefaction curve for random samples of datasets. The total number of sites only refer to phosphosites identified with a localization probability greater than 0.5. Point-ranges represent the binned mean and confidence limits based on non-parametric bootstrap. Polynomial function fitting is displayed as a visual aide. Shaded area represents a confidence interval of 0.995.
Supplementary Figs. 1–11 and Supplementary Note 1
PRIDE data included in the reanalysis. The spreadsheet includes the PRIDE datasets under study, specifying each of the raw files included in the search and their search parameters.
Annotated phosphoproteome features that might indicate phosphosite function for the 116,258 sites contained in the subset of 21,009 reviewed proteins within the human UniProt reference proteome.
Phosphosite functional scores of 116,258 scored sites contained in the subset of 21,009 reviewed proteins within the human UniProt reference proteome.
RANBP1 pulldown MS results: results from MS experiments.
ClinVar variants: functional score for the variants associated with human disease that overlap with phosphosites as annotated in ClinVar.
Thermal proteome profiling experiment for Tdh3-mutant strains. Measured changes in protein stability and abundance from the thermal proteome profiling experiment comparing the Tdh3-mutant strains with WT.
About this article
Cite this article
Ochoa, D., Jarnuczak, A.F., Viéitez, C. et al. The functional landscape of the human phosphoproteome. Nat Biotechnol 38, 365–373 (2020). https://doi.org/10.1038/s41587-019-0344-3
This article is cited by
Nature Communications (2023)
Nature Structural & Molecular Biology (2023)
Nature Methods (2022)