E3 ligases and degrons, the sequences they recognize in target proteins, are key parts of the ubiquitin-mediated proteolysis system. There are several examples of alterations of these two components of the system that have a role in cancer. Here we uncover the landscape of the contribution of such alterations to tumorigenesis across cancer types. We first systematically identified new instances of degrons across the human proteome by using a random forest classifier and validated the functionality of a dozen of them, exploiting somatic mutations across >7,000 tumors. We detected signals of positive selection across known and new degron instances. Our results reveal that several oncogenes are frequently targeted by mutations that affect the sequence of their degrons or their cognate E3 ubiquitin ligases, causing an abnormal increase in their protein abundance. Overall, an important number of driver mutations across primary tumors affect either degrons or E3-ubiquitin ligases.
Subscribe to Journal
Get full journal access for 1 year
only $8.25 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
All data used in the analyses described in the paper are freely available within the public domain. The human tumor RNA-seq, somatic mutation and CNA data were derived from the TCGA Research Network. Specific links to each of the TCGA datasets are detailed in the Methods. Published TCGA RPPA data were downloaded from the TCPA portal (http://tcpaportal.org/tcpa/download.html; version 4.2). MS datasets that were published previously and were reanalyzed here are available from the Clinical Proteomics Tumor Analysis Consortium. The CCLE datasets reanalyzed here can be obtained from the Broad Institute portal (https://portals.broadinstitute.org/ccle/data). Specific links to each of the CCLE datasets are detailed in the Methods. The list of proteins involved in ubiquitination (UBSs) and deubiquitination (DUBs) was manually created by integrating previous knowledge from UniProt and E3NET (see above). Human protein–protein interaction data are available at STRING (9606.protein.links.detailed.v10.5.txt.gz; 19 February 2018). Amino acid sequences from 32,022 reviewed human protein isoforms are available from UniProt (see above). Degron motifs and degron instances in the human proteome are available at ELM (http://elm.eu.org/downloads.html; 15 May 2019) and previous studies (see above). Phosphorylation and ubiquitination sites are available at PhosphositePlus (https://www.phosphosite.org/; 4 October 2018). The structures of an NFE2L2 fragment in complex with KEAP1 and the BTRC degron of CTNNB1 are available at PDB (see above). Pan-cancer gene fusions in the TCGA cohort are available at the Tumor Fusion Gene Data Portal (http://www.tumorfusions.org/; 10 July 2018). The list of cancer-related genes is available at the Cancer Gene Census (download on 5 June 2019). The biomarkers of anticancer drug response are available from the Cancer Genome Interpreter (https://www.cancergenomeinterpreter.org/).
All software and data produced as part of the study (including scripts needed to reproduce all results described in the paper) are available at https://bitbucket.org/account/user/bbglab/projects/PD.
Al-Hakim, A. et al. The ubiquitous role of ubiquitin in the DNA damage response. DNA Repair9, 1229–1240 (2010).
Arlow, T., Scott, K., Wagenseller, A. & Gammie, A. Proteasome inhibition rescues clinically significant unstable variants of the mismatch repair protein Msh2. Proc. Natl Acad. Sci. USA110, 246–251 (2013).
Bassermann, F., Eichner, R. & Pagano, M. The ubiquitin proteasome system—implications for cell cycle control and the targeted treatment of cancer. Biochim. Biophys. Acta Mol. Cell Res.1843, 150–162 (2014).
Ciechanover, A., Heller, H., Elias, S., Haas, A. L. & Hershko, A. ATP-dependent conjugation of reticulocyte proteins with the polypeptide required for protein degradation. Proc. Natl Acad. Sci. USA77, 1365–1368 (1980).
Gillette, T. G. et al. Distinct functions of the ubiquitin–proteasome pathway influence nucleotide excision repair. EMBO J.25, 2529–2538 (2006).
Guharoy, M., Bhowmick, P., Sallam, M. & Tompa, P. Tripartite degrons confer diversity and specificity on regulated protein degradation in the ubiquitin–proteasome system. Nat. Commun.7, 10239 (2016).
Hershko, A., Ciechanover, A., Heller, H., Haas, A. L. & Rose, I. A. Proposed role of ATP in protein breakdown: conjugation of protein with multiple chains of the polypeptide of ATP-dependent proteolysis. Proc. Natl Acad. Sci. USA77, 1783–1786 (1980).
Liu, Y., Beyer, A. & Aebersold, R. On the dependency of cellular protein levels on mRNA abundance. Cell165, 535–550 (2016).
Mészáros, B., Kumar, M., Gibson, T. J., Uyar, B. & Dosztányi, Z. Degrons in cancer. Sci. Signal.10, eaak9982 (2017).
Yoo, S.-H. et al. Competing E3 ubiquitin ligases govern circadian periodicity by degradation of CRY in nucleus and cytoplasm. Cell152, 1091–1105 (2013).
Stewart, M. D., Ritterhoff, T., Klevit, R. E. & Brzovic, P. S. E2 enzymes: more than just middle men. Cell Res.26, 423–440 (2016).
Braten, O. et al. Numerous proteins with unique characteristics are degraded by the 26S proteasome following monoubiquitination. Proc. Natl Acad. Sci. USA113, E4639–E4647 (2016).
Komander, D., Clague, M. J. & Urbé, S. Breaking the chains: structure and function of the deubiquitinases. Nat. Rev. Mol. Cell Biol.10, 550–563 (2009).
Vu, P. K. & Sakamoto, K. M. Ubiquitin-mediated proteolysis and human disease. Mol. Genet. Metab.71, 261–266 (2000).
Ge, Z. et al. Integrated genomic analysis of the ubiquitin pathway across cancer types. Cell Rep.23, 213–226 (2018).
Dinkel, H. et al. ELM 2016—data update and new functionality of the Eukaryotic Linear Motif resource. Nucleic Acids Res.44, D294–D300 (2016).
Bateman, A. et al. UniProt: the universal protein knowledgebase. Nucleic Acids Res.45, D158–D169 (2017).
Kim, T. Y. et al. Substrate trapping proteomics reveals targets of the βTrCP2/FBXW11 ubiquitin ligase. Mol. Cell. Biol.35, 167–181 (2015).
Arabi, A. et al. Proteomic screen reveals Fbw7 as a modulator of the NF-κB pathway. Nat. Commun.3, 976 (2012).
Franceschini, A. et al. STRINGv9.1: protein–protein interaction networks, with increased coverage and integration. Nucleic Acids Res.41, D808–D815 (2013).
Ellrott, K. et al. Scalable open science approach for mutation calling of tumor exomes using multiple genomic pipelines. Cell Syst.6, 271–281 (2018).
Li, J. et al. TCPA: a resource for cancer functional proteomics data. Nat. Methods10, 1046–1047 (2013).
Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature483, 603–607 (2012).
Shibata, T. et al. Cancer related mutations in NRF2 impair its recognition by Keap1–Cul3 E3 ligase and promote malignancy. Proc. Natl Acad. Sci. USA105, 13568–13573 (2008).
Liu, C. et al. β-Trcp couples β-catenin phosphorylation–degradation and regulates Xenopus axis formation. Proc. Natl Acad. Sci. USA96, 6273–6278 (1999).
Santra, M. K., Wajapeyee, N. & Green, M. R. F-box protein FBXO31 mediates cyclin D1 degradation to induce G1 arrest after DNA damage. Nature459, 722–725 (2009).
Li, Y. et al. Structural basis of the phosphorylation-independent recognition of cyclin D1 by the SCF FBXO31 ubiquitin ligase. Proc. Natl Acad. Sci. USA115, 319–324 (2018).
Lukashchuk, N. & Vousden, K. H. Ubiquitination and degradation of mutant p53. Mol. Cell. Biol.27, 8284–8295 (2007).
Wawrzynow, B., Zylicz, A. & Zylicz, M. Chaperoning the guardian of the genome. The two-faced role of molecular chaperones in p53 tumor suppressor action. Biochim. Biophys. Acta Rev. Cancer1869, 161–174 (2018).
Qiu, X.-B. & Goldberg, A. L. Nrdp1/FLRF is a ubiquitin ligase promoting ubiquitination and degradation of the epidermal growth factor receptor family member, ErbB3. Proc. Natl Acad. Sci. USA99, 14843–14848 (2002).
Huang, Z. et al. The E3 ubiquitin ligase NEDD4 negatively regulates HER3/ErbB3 level and signaling. Oncogene34, 1105–1115 (2015).
Lu, Z., Xu, S., Joazeiro, C., Cobb, M. H. & Hunter, T. The PHD domain of MEKK1 acts as an E3 ubiquitin ligase and mediates ubiquitination and degradation of ERK1/2. Mol. Cell9, 945–956 (2002).
Nakamura, M., Tokunaga, F., Sakata, S. & Iwai, K. Mutual regulation of conventional protein kinase C and a ubiquitin ligase complex. Biochem. Biophys. Res. Commun.351, 340–347 (2006).
Chen, D. et al. Amplitude control of protein kinase C by RINCK, a novel E3 ubiquitin ligase. J. Biol. Chem.282, 33776–33787 (2007).
Saei, A. et al. Loss of USP28-mediated BRAF degradation drives resistance to RAF cancer therapies. J. Exp. Med.215, 1913–1928 (2018).
Hernandez, M. A. et al. Regulation of BRAF protein stability by a negative feedback loop involving the MEK–ERK pathway but not the FBXW7 tumour suppressor. Cell. Signal.28, 561–571 (2016).
Galligan, J. T. et al. Proteomic analysis and identification of cellular interactors of the giant ubiquitin ligase HERC2. J. Proteome Res.14, 953–966 (2015).
Li, D. et al. ARAF recurrent mutation causes central conducting lymphatic anomaly treatable with a MEK inhibitor. Nat. Med.25, 1116–1122 (2019).
Bailey, M. H. et al. Comprehensive characterization of cancer driver genes and mutations. Cell173, 371–385 (2018).
Gonzalez-Perez, A. et al. IntOGen-mutations identifies cancer drivers across tumor types. Nat. Methods10, 1081–1082 (2013).
Tamborero, D. et al. Comprehensive identification of mutational cancer driver genes across 12 tumor types. Sci. Rep.3, 2650 (2013).
Mularoni, L. et al. OncodriveFML: a general framework to identify coding and non-coding regions with cancer driver mutations. Genome Biol.17, 128 (2016).
Martincorena, I. et al. Universal patterns of selection in cancer and somatic tissues. Cell171, 1029–1041 (2017).
Sun, X.-X. et al. The nucleolar ubiquitin-specific protease USP36 deubiquitinates and stabilizes c-Myc. Proc. Natl Acad. Sci. USA112, 3734–3739 (2015).
Futreal, A. et al. A census of human cancer genes. Nat. Rev. Cancer4, 177–183 (2004).
Tamborero, D. et al. Cancer Genome Interpreter annotates the biological and clinical relevance of tumor alterations. Genome Med.10, 25 (2018).
Hausser, J., Syed, A. P., Bilen, B. & Zavolan, M. Analysis of CDS-located miRNA target sites suggests that they can effectively inhibit translation. Genome Res.23, 604–615 (2013).
Gonzalez-Perez, A. Circuits of cancer drivers revealed by convergent misregulation of transcription factor targets across tumor types. Genome Med.8, 6 (2016).
Gonzalez-Perez, A., Jene-Sanz, A. & Lopez-Bigas, N. The mutational landscape of chromatin regulatory factors across 4,623 tumor samples. Genome Biol.14, R106 (2013).
Frigola, J., Iturbide, A., Lopez-Bigas, N., Peiro, S. & Gonzalez-Perez, A. Altered oncomodules underlie chromatin regulatory factors driver mutations. Oncotarget7, 30748–30759 (2016).
Sabarinathan, R. et al. The whole-genome panorama of cancer drivers. Preprint at bioRxiv https://doi.org/10.1101/190330 (2017).
Zhang, H. et al. Integrated proteogenomic characterization of human high-grade serous ovarian cancer. Cell166, 755–765 (2016).
Mertins, P. et al. Proteogenomics connects somatic mutations to signalling in breast cancer. Nature534, 55–62 (2016).
Wei, L. et al. TCGA-assembler 2: software pipeline for retrieval and processing of TCGA/CPTAC data. Bioinformatics34, 1615–1617 (2018).
Han, Y., Lee, H., Park, J. C. & Yi, G.-S. E3Net: a system for exploring E3-mediated regulatory networks of cellular functions. Mol. Cell. Proteomics11, O111.014076 (2012).
Hornbeck, P. V. et al. PhosphoSitePlus, 2014: mutations, PTMs and recalibrations. Nucleic Acids Res.43, D512–D520 (2015).
Pettersen, E. F. et al. UCSF Chimera—a visualization system for exploratory research and analysis. J. Comput. Chem.25, 1605–1612 (2004).
Zhou, W. et al. TransVar: a multilevel variant annotator for precision genomics. Nat. Methods12, 1002–1003 (2015).
Sondka, Z. et al. The COSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers. Nat. Rev. Cancer18, 696 (2018).
Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet.46, 310–315 (2014).
Perez, F. & Granger, B. E. IPython: a system for interactive scientific computing. Comput. Sci. Eng.9, 21–29 (2007).
McKinney, W. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython (O’Reilly Media, Inc., 2017).
Oliphant, T. E. Guide to NumPy (CreateSpace Independent Publishing Platform, 2015).
Hunter, J. D. Matplotlib: a 2D graphics environment. Comput. Sci. Eng.9, 90–95 (2007).
Waskom, M. et al. seaborn v0.5.0 Zenodo https://doi.org/10.5281/zenodo.12710 (2014).
Jolly, K. Hands-On Data Visualization with Bokeh: Interactive Web Plotting for Python Using Bokeh (Packt Publishing, 2018).
N.L.-B. acknowledges funding from the European Research Council (consolidator grant 682398) and the ERDF/Spanish Ministry of Science, Innovation and Universities–Spanish State Research Agency/DamReMap Project (RTI2018-094095-B-I00). A.G.-P. is supported by a Ramón y Cajal contract (RYC-2013-14554). IRB Barcelona is a recipient of a Severo Ochoa Centre of Excellence Award from the Spanish Ministry of Economy and Competitiveness (MINECO; Government of Spain) and is supported by CERCA (Generalitat de Catalunya). The results shown here are in whole or part based upon data generated by the TCGA Research Network. Data used in this publication were generated by the National Cancer Institute Clinical Proteomic Tumor Analysis Consortium (CPTAC).
The authors declare no competing interests.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
(a) Distribution of the values of biochemical properties of annotated degron instances and equally long randomly chosen sequences from the human proteome. The p-values were derived from two-tailed Mann-Whitney tests. Left N: number of validated degron instances; right N: number of random protein sequences sampled from the proteome. (b) Over or under representation of each amino acid (Fisher’s exact test odds ratio) across the sequence of annotated degron instances. Significant cases (p-value < 0.05) are circled in black. Relevant numbers are defined in (a). (c) Stratified 5-fold cross-validation ROC curve (as Fig. 1c) of a random forest classifier trained on annotated degrons and random sequences from the same set of proteins. Relevant numbers are defined in (a). (d) Precision/Recall of the random forest classifier described in the main paper (5-fold cross-validation). Relevant numbers are defined in (a). (e) Stratified 5-fold cross-validation ROC curve of a random forest classifier trained as described in the main paper, but adding random features highly correlated to the 11 used in the main paper. Relevant numbers are defined in (a). (f, g) Biochemical features at the top of the list of importance according to the classifiers trained in the main paper and above, for panel c. Bars represent the mean importance of each feature across the dataset, with the whiskers representing one standard deviation. (h) Stratified 5-fold cross-validation ROC curve resulting from the classification (with the random forest classifier described in the main paper) of experimentally identified FBXW11 degron instances and amino acid sequences of the same length randomly sampled from human proteins. Number of positive and negative instances defined at the top of the panel. (i) Stratified 5-fold cross-validation ROC curve resulting from the classification (with the random forest classifier described in the main paper) of experimentally identified FBXW11 degrons and random amino acid sequences from proteins deemed non FBXW11 targets. Number of positive and negative instances defined at the top of the panel. (j) Stratified 5-fold cross-validation ROC curve resulting from the classification (with the random forest classifier described in the main paper) of experimentally identified FBXW7 degrons and amino acid sequences of the same length randomly sampled from human proteins. Number of positive and negative instances defined at the top of the panel. (k-m) Correlation between the length of proteins and the number of matches (k), novel degron instances (l), novel degron instances with annotations (m) in their sequence. The numbers shown (R-value) correspond to the Pearson’s correlation coefficient. The trendline and its confidence intervals are shown as a line and a shaded area, respectively. N: number of proteins (k), novel degron instances (l), or novel degron instances with further supporting information (m).
Each plot corresponds to the matches identified of one degron across the proteome. Degron probabilities are represented as a frequency histogram (solid light purple bars for motif matches and solid dark purple bars for annotated degron instances) and as the corresponding kernel-smoothed distribution (purple lines). Dashed vertical lines mark the site of the distribution that corresponds to the annotated degron with lowest probability, used as threshold to select high-confidence novel degron instances. In degron motifs with no annotated degron instance (that is, without solid dark purple histogram), the selected threshold is set at the lowest degron probability of any annotated degron (that is, 0.65). Values for all individual degrons are presented in Supplementary Table 2 and Supplementary Data.
(a) Needle-plot representing the distribution of primary tumor mutations along the sequence of CTNNB1 (analogous to that of NFE2L2 in main Fig. 3a). (b) One recurrent mutation (S37C) projected onto the 3D structure of the CTNNB1-BTRC complex. (c, d) Comparisons of protein stability change upon mutations analogous to those represented in main Fig. 3e,f, restricted to tumors in which the gene harboring the degron under analysis is diploid. As in Fig. 3f, all p-values shown in this figure are derived from a one-tailed Mann-Whitney test. When two rows of p-values appear, the top value corresponds to the comparison between the distribution of stability change values of mutations in different groups and that of wild-type forms of the proteins, and the bottom value to the comparison with all missense mutations in the dataset. (e) Distribution of protein stability change caused by mutations in novel degrons instances in different quartiles of degron probability. (f, g) Comparisons of protein stability upon mutations analogous to those represented in main Fig. 3e,f, but carried out using cancer cell lines mutations. (h) Same as panel (e) for cancer cell lines mutations. (i) Thirteen proteins carrying mutations in novel degron instances exhibit a clear trend towards stability increase (determined using mass-spectrometry rather than RPPA as in previous examples), although non-significant due to lack of statistical power. (j) Distribution of stability change of proteins with non-synonymous mutations in different quartiles of VAF (that is, present in different fractions of tumor cells) which do not overlap with known or novel degron instances. The p-values correspond to the comparison (one-tailed Mann-Whitney test) between the distribution of stability change values of mutations in each quartile with respect to wild-type forms of the proteins. N: number of mutations in groups (in all panels). Boxplots in all panels are defined as in Fig. 2.
Identification of annotated degrons in CTNNB1 (a), NFE2L2 (b) and MET (c), PRKCA (d), BRAF (e), and ARAF (f) using the approach devised to identified de novo degrons. The panels follow the same composition and color codes as those in Fig. 4f,g. In parentheses, the names of the corresponding antibodies. N: number of tumor samples in each group.
(a, b) QQ-plots relating the observed and expected distributions of p-values produced by the SMDeg (a) and FMDeg (b) tests on the TCGA pan-cancer cohort. N: number of tumor samples. (c, d) Novel degron instances that appear significant (FDR < 1%) in the SMDeg (c), or significant (FDR < 10%) or nearly significant (FDR < 25%) in the FMDeg test (d) across cancer cell lines. N: number of cancer cell lines. (e) De novo degron instances that appear significant (FDR < 1%) in the SMDeg test across TCGA primary tumors. N: number of tumor samples. (f–h) Needle-plots representing the distribution of mutations in cancer cell lines along the sequences of ETV5 (f; significant in SMDeg), CCND3 (g; significant in SMDeg and FMDeg), USP36 (h; significant in SMDeg).
(a) Driver E3s across cancer cell lines are identified through signals of positive selection detected by OncodriveFML and dNdScv. Analogous to main Fig. 6a. N: number of cancer cell lines. (b) The combination of the two methods of positive selection employed yields 37 driver E3s across primary tumors. The size of the driver E3s correlates with their mutation frequency across TCGA samples. (c) Overlap between the lists of driver E3s identified in the study (red), annotated in the Cancer Gene Census (green) or identified in a recent analysis15 of TCGA datasets (blue).
(a) The bars represent the proportion of tumors in each cohort with CCNE1 alterations that could be targeted directly via CDK inhibitors (dark blue), or with alterations of FBXW7, with (medium blue) or without (light blue) increased stability of CCNE1 which could in principle be targeted indirectly. In parentheses, number of tumor samples in each cohort. (b) Mean percentage (and standard deviations as whiskers) of driver mutations in either driver E3s or driver degrons that do not occur in known cancer genes. In parentheses, number of tumor samples in each cohort.
Supplementary Tables 1–6
Raw files containing proteome-wide annotated matches of degron motifs, degrons and E3s under positive selection. A README file contains a detailed description of the files enclosed within the zip file.
About this article
Cite this article
Martínez-Jiménez, F., Muiños, F., López-Arribillaga, E. et al. Systematic analysis of alterations in the ubiquitin proteolysis system reveals its contribution to driver mutations in cancer. Nat Cancer 1, 122–135 (2020). https://doi.org/10.1038/s43018-019-0001-2
Molecular Cell (2021)
ActiveDriverDB: Interpreting Genetic Variation in Human and Cancer Genomes Using Post-translational Modification Sites and Signaling Networks (2021 Update)
Frontiers in Cell and Developmental Biology (2021)
PhosphoEffect: Prioritizing Variants On or Adjacent to Phosphorylation Sites through Their Effect on Kinase Recognition Motifs
Computational and Structural Biotechnology Journal (2020)
Nature Cancer (2020)