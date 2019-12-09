Abstract
Protein phosphorylation is a key post-translational modification regulating protein function in almost all cellular processes. Although tens of thousands of phosphorylation sites have been identified in human cells, approaches to determine the functional importance of each phosphosite are lacking. Here, we manually curated 112 datasets of phospho-enriched proteins, generated from 104 different human cell types or tissues. We re-analyzed the 6,801 proteomics experiments that passed our quality control criteria, creating a reference phosphoproteome containing 119,809 human phosphosites. To prioritize functional sites, we used machine learning to identify 59 features indicative of proteomic, structural, regulatory or evolutionary relevance and integrate them into a single functional score. Our approach identifies regulatory phosphosites across different molecular mechanisms, processes and diseases, and reveals genetic susceptibilities at a genomic scale. Several regulatory phosphosites were experimentally validated, including identifying a role in neuronal differentiation for phosphosites in SMARCC2, a member of the SWI/SNF chromatin-remodeling complex.
Data availability
All MS data, including raw and MQ intermediate processing settings and results files, are available in PRIDE under the accession PXD012174. The functional annotation of the phosphoproteome, as well as the gold standard and resulting functional scores, is available in Supplementary Tables. The conditional regulation data used in feature generation are available at http://phosphate.com.
Code availability
The code to proceed with the generation of some of the features (for example, age reconstruction or structural hotspots) is available in the respective repositories as described in Supplementary Notes 1. All features, the MS phosphoproteome and the gold standard, as well as the necessary code to train and apply the functional score model are available in the R package funscoR (https://github.com/evocellnet/funscoR).
Acknowledgements
This study would have been impossible without the selfless deposition of data from hundreds of authors. We extend our gratitude to every one of them. We thank J. Cox for his insightful advice on the site decoy multiple-testing correction. We would like to thank the members of the Beltrao group for their support in collecting features and their relevant comments, as well as D. Ocaña and A. Cafferkey as part of the EMBL-EBI Technical Services Cluster. We thank D. Helm from the EMBL Proteomics Core Facility for help with the analysis of thermal proteome profiling samples. This study has been funded by EMBL core funding and the Wellcome Trust (grant numbers WT101477MA and 208391/Z/17/Z). P.B. and D.O. are supported by a Starting Grant Award from the European Research Council (ERC-2014-STG 638884 PhosFunc).
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Integrated supplementary information
Supplementary Figure. 1 Ratio of phosphorylated proteins binned by the protein consensus abundance.
At least one identified phosphorylation was required to consider a protein phosphorylated. Protein abundance data obtained from the PaxDb consensus human proteome (Methods).
Supplementary Figure. 2 Functional score performance against effect predictions of phospho-deficient variants when predicting functional phosphosites or disease-related phosphosites.
Functional and disease-related phosphosites obtained from PSP database.
Supplementary Figure. 3 Examples of experimentally validated phosphosites not included in the training set.
Functional score and position for phosphosites identified in the PRIDE search and colored by the level of functional annotation in PhosphositePlus (PSP). Sites marked in red represent sites of unknown function in PSP that were supported by experimental evidence in the literature. For example, valosin-containing protein (VCP) pY805 is ranked as the highest scoring (0.65) phosphosite in the protein and it’s known to disrupt interaction with both PNGase and Ufd3 (PNAS 104:21, 8785-8790, 2007). Similarly, alanine or glutamate mutation of the best-scored S/T site pS6 (0.81) in SDCBP abolished interaction with ubiquitin, as demonstrated by His-Ub pulldown assays (Journal of Biological Chemistry, 286:45, 39606-39614, 2011). Phospho-mimicking mutations in the highly scored p62/SQSTM1 pS24 (0.72) restored polymerization instead (Biochimica et Biophysica Acta—Molecular Cell Research 1843:11, 2765-2774, 2014).
Supplementary Figure. 4 Separation of sites of known and unknown function based on their molecular regulatory role.
The molecular function was obtained from PSP. n = total number of identified phosphosites with unknown (top) or known (bottom) regulatory function. Vertical line represents the median value.
Supplementary Figure. 5 Identification and characterization of known regulatory sites determining protein binding specificity.
a) b-ions and y-ions for the 4 only RHOA phosphopeptides identified phosphorylated in Tyr34. The 4 peptides were identified from Primary AML tumors treated with the protein tyrosine phosphatase pervanadate. b) Number of MS/MS identifications in all samples containing modified or unmodified peptides in RHOA. c) Aligned structural data for binding of RHOA with 7 different partners (PDB files: 1tx4, 5hpy and 3msx—left—4d0n, 4xh9, 2rgn and1tx4—right). The position of the acceptor tyrosine—red—changes depending on the group of binding partners.
Supplementary Figure. 6 Functional score predicts the impact of mutations on protein-protein interactions.
The impact of mutating a phosphosite residue on protein interactions was compiled for a total of 394 human phosphosite positions. For each bin of functional scores, we calculated the fold ratio of observing an effect (gain or loss of interaction) over no effect.
Supplementary Figure. 7 Changes in thermal stability and protein abundance levels for GAPDH enzymes for the KO and phospho-mutant strains of THD3.
The protein abundance and protein thermal stability of the yeast proteome was measured using a Thermal Proteome Profiling (TPP) experiment, comparing the THD3 mutant strains (KO, S149A, S151A) with the WT strain. The same comparison was performed in the presence or absence of doxorubicin. Shown here are the fold changes in abundance and stability of the 3 GAPDH enzymes in yeast (TDH1, TDH2 and TDH3). * denotes abs(score)>2, FDR=0.05 and *** denotes abs(score)>3, FDR=0.01.
Supplementary Figure. 8 Gene set enrichment analysis for KEGG pathways.
For KEGG pathways with more than 20 S. cerevisiae genes, we performed a gene set enrichment (GSEA) test for the fold changes of a given mutant relative to WT. * denotes pathways with significant enrichment after accounting for multiple testing (FDR=0.02).
Supplementary Figure. 9 Increased level of Smarcc2/Baf170 protein is detected at day-12 of neuronal differentiation independent of the genetic background.
Each lane indicates individual clone: homozygous (3 biological replicates), heterozygous (2 biological replicates) and control (2 biological replicates). Wild type (WT) are parental mESCs without CRISPR targeting.
Supplementary Figure. 10 Morphological differences in day-12 neuronal differentiation for Smarcc2 CRISPR control, heterozygous and homozygous S302A/S304A clones.
Every representative image corresponds to a different biological replicate. Bright field images, 20x.
Supplementary Figure 11 Accumulation of phosphosites as the number of phospho-enriched datasets deposited in PRIDE grows.
Rarefaction curve for random samples of datasets. The total number of sites only refer to phosphosites identified with a localization probability greater than 0.5. Point-ranges represent the binned mean and confidence limits based on non-parametric bootstrap. Polynomial function fitting is displayed as a visual aide. Shaded area represents a confidence interval of 0.995.
Supplementary information
Supplementary Materials
Supplementary Figs. 1–11 and Supplementary Note 1
Supplementary Table 1
PRIDE data included in the reanalysis. The spreadsheet includes the PRIDE datasets under study, specifying each of the raw files included in the search and their search parameters.
Supplementary Table 2
Annotated phosphoproteome features that might indicate phosphosite function for the 116,258 sites contained in the subset of 21,009 reviewed proteins within the human UniProt reference proteome.
Supplementary Table 3
Phosphosite functional scores of 116,258 scored sites contained in the subset of 21,009 reviewed proteins within the human UniProt reference proteome.
Supplementary Table 4
RANBP1 pulldown MS results: results from MS experiments.
Supplementary Table 5
ClinVar variants: functional score for the variants associated with human disease that overlap with phosphosites as annotated in ClinVar.
Supplementary Table 6
Thermal proteome profiling experiment for Tdh3-mutant strains. Measured changes in protein stability and abundance from the thermal proteome profiling experiment comparing the Tdh3-mutant strains with WT.
Rights and permissions
About this article
Cite this article
Ochoa, D., Jarnuczak, A.F., Viéitez, C. et al. The functional landscape of the human phosphoproteome. Nat Biotechnol (2019) doi:10.1038/s41587-019-0344-3
Received
Accepted
Published
DOI