Dissecting the sequence determinants for dephosphorylation by the catalytic subunits of phosphatases PP1 and PP2A

The phosphatases PP1 and PP2A are responsible for the majority of dephosphorylation reactions on phosphoserine (pSer) and phosphothreonine (pThr), and are involved in virtually all cellular processes and numerous diseases. The catalytic subunits exist in cells in form of holoenzymes, which impart substrate specificity. The contribution of the catalytic subunits to the recognition of substrates is unclear. By developing a phosphopeptide library approach and a phosphoproteomic assay, we demonstrate that the specificity of PP1 and PP2A holoenzymes towards pThr and of PP1 for basic motifs adjacent to the phosphorylation site are due to intrinsic properties of the catalytic subunits. Thus, we dissect this amino acid specificity of the catalytic subunits from the contribution of regulatory proteins. Furthermore, our approach enables discovering a role for PP1 as regulator of the GRB-associated-binding protein 2 (GAB2)/14-3-3 complex. Beyond this, we expect that this approach is broadly applicable to detect enzyme-substrate recognition preferences.

The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable.

For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.

Software and code
Policy information about availability of computer code Data collection

Data analysis
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

Maja Köhn
Jun 16, 2020 Mass spectrometry measurements have been performed on Q Exactive Plus Orbitrap or Orbitrap Fusion Lumos mass spectrometers (Thermo Fischer Scientific) and commercial operating software from Thermo Fischer Scientific was used. Gel-filtration was performed on an ÄKTA Explorer system and the commercial software Unicorn v5 was used. Live-cell imaging was performed on a A1R Confocal Scanning System equipped on a Nikon Ti-E inverted microscope using the commercial operating software NIS Elements 4.5. Western blots were imaged on a ChemiDoc Touch Imaging System (BioRad) or a Fusion FX Imaging System (Vilber). Biochemical assays were read on a Synergy H4 microplate reader (Biotek) using the commerical Gen5 software.
Mascot 2.4 (Matrix Science) was used for peptide searching, isobarQuant (https://github.com/protcode/isob/archive/1.1.0.zip) and MaxQuant (v1.6.0.16) were used to quantify peptide and protein abundances, R (v3.3.2, v3.5.1) and Perseus (v.1.5.8.5) were used for custom data analysis and statistical analysis. Image analysis was carried out in Fiji (Fiji is Just Image J) v2.0.0-rc-34/1.50a and Western blots were quantified in ImageLab (BioRad)  Data Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability Field-specific reporting Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Life sciences study design
All studies must disclose on these points even when the disclosure is negative.

MRI-based neuroimaging
Mass spectrometry data have been deposited at the ProteomeXchange Consortium (proteomecentral.proteomexchange.org) using the PRIDE partner repository with the identifiers PXD012026 and PXD013775. The output of all MS-based results is furthermore summarized in Supplementary Tables 1-3 and Supplementary  Data 1-3. Supplementary Tables 1-3 as well as Supplementary Figures 1-10 are found in the Supplementary Information, Supplementary Data 1-3 are provided as separate excel files. The source data underlying Figures 1c,d;3b,d;4a,b;5a;6a,b,5,6,7a,b and 8a,b, data for quality control of directed peptide synthesis and certificates for cell line authentication are provided as a Source Data. MS reference datasets were downloaded from Swiss-Prot/UniprotKB (www.uniprot.org). Protein structures were downloaded from www.rcsb.org/pdb.org using identifiers 2NPP, 3EGG, 4I5L, 3DW8. Phosphorylation site annotation was downloaded from www.phosphosite.org (accessed 3 July 2018) and datasets for phosphatase regulators were obtained from genenames.org and uniprot.org (accessed March/April 2020). All other data supporting the findings presented herein are available from the authors upon request.
Biological triplicates were chosen for sample size. These are standard in basic life science research and have led to conclusive results with appropriate statistical power in published MS studies and molecular biology experiments using similar approaches in the past.
Raw data of MS experiments deposited to the PRIDE repository contains additional channels not presented as a whole in the manuscript since they did not fulfill quality criteria . They were not used for findings presented in the manuscript and their exclusion was pre-established before proceeding with biological interpretations.
All findings presented in the manuscript are based at least on three successfully replicated attempts. All replicated attempts under the same conditions were successful. For Supplementary Figures 5 and 6, gel-filtration was only performed once per condition, since the elution volume of proteins/complexes in gel-filtration is highly reproducible and like HPLC analysis of chemicals a single injection is standard.