Identifying cancer driver mutations is essential to understand disease biology and devise effective therapies, but remains a complex endeavor. A focused analytical approach is now presented that defines driver mutations affecting ubiquitin-mediated proteolysis through machine learning and mining of cancer multi-omics data.
In the post-genomic era, patients’ tumor DNA is routinely sequenced to guide the selection of treatment options. An urgent gap between cancer genomics and precision cancer medicine is the identification of driver somatic mutations1. Driver mutations are typically defined as somatic variants that have critical functional effects that confer selective advantages to cells during clonal evolution and are likely to have roles in cancer progression or treatment responses2. In contrast, most somatic mutations observed in tumors are ‘passenger’ mutations, which have no biological effects and are therefore neutral during cancer evolution. Although many computational algorithms have been developed to distinguish driver from passenger mutations3, the predictive power of these algorithms is generally limited4, not only because the number of driver mutations is very low but more importantly because the diverse biological effects of driver mutations make them difficult to capture in a general algorithm. Therefore, an approach focusing on a specific biological process with well-defined characteristics may be more effective. In an elegant example of such a focused approach, in this issue of Nature Cancer, Martínez-Jiménez et al. present a systematic pan-cancer analysis of cancer somatic mutations in the ubiquitin-mediated proteolysis system (UPS)5.
Ubiquitin-mediated proteolysis plays an important role in controlling the abundance and spatiotemporal distribution of proteins in a broad range of cellular processes6. An essential step in this complicated process is the recognition of specific target proteins for degradation by binding of a cognate E3 ubiquitin ligase (E3) to short sequences of the target protein, termed degrons. Subsequently, through the action of two enzymes (E1 and E2), ubiquitin is transferred to a lysine residue of the target protein and serves as a signal for proteasomal degradation. Altered E3 target recognition, as a result of somatic mutations in either the E3s or degron sites, can lead to abnormal stabilization of oncoproteins, thereby contributing to tumorigenesis7 (Fig. 1). However, a systematic analysis of such alterations has been challenging because of the very limited knowledge of degron sites in the human proteome. To tackle this challenge, the authors first trained a random-forest classifier based on the 11 biochemical features of a relatively small set of known degron sites and computationally predicted a much larger set of >20,000 novel degron sites. Integrating parallel messenger RNA (through mRNA sequencing) and protein expression data (through reverse-phase protein array analysis8) from The Cancer Genome Atlas (TCGA), the authors next developed a quantitative matrix to assess the potential effects of mutated degrons on protein stability. Through a pan-cancer analysis of ~7,000 TCGA primary tumors and ~900 cell lines from the Cancer Cell Line Encyclopedia, this approach largely validated the functionality of the computationally identified degrons. Furthermore, highly stabilizing mutations that did not overlap with annotated degrons were used to identify de novo degrons of unknown motifs. Armed with these better-annotated degron sites, the authors developed two computational methods to detect driver degrons on the basis of deviations in abundance or the predicted functional effects of missense mutations in degron sites. This analysis identified 35 driver-degron candidates under positive selection, thus substantially enlarging the list of known driver degrons. Finally, focusing on the effects of mutated E3 drivers on the protein stability of downstream genes, the authors identified novel E3–target links. Collectively, this study estimated that the combined driver mutations contributed by E3s and degrons represent >10% of driver mutations in well-studied cancer genes, thus highlighting a previously underappreciated role of altered ubiquitin-mediated proteolysis in cancer biology9.
The implications of this study are manifold. First, by using a machine-learning approach, the authors have provided a much more comprehensive list (>100-fold longer than the previous list10) of degron sites in the human proteome, thereby advancing basic understanding of protein structure and function. Second, using mutation data across many tumors, the authors have developed novel methods to generate a catalog of driver degrons and E3 drivers, thereby directly assessing the contribution of altered E3–target interactions in tumorigenesis. Importantly, the results not only identified driver events but also generated a testable hypothesis associated with each driver, thereby facilitating future experimental investigations. Finally, the study provides a theoretical basis for exploring the clinical utility of targeting downstream effects by using loss-of-function of E3 ligases, thus suggesting a strategy for repurposing anti-cancer drugs.
This study also suggests an effective approach for identifying cancer driver mutations in a specific biological process in general. According to the example set here by Martínez-Jiménez et al., three key components are required for such an approach. The first component is a detailed molecular-level picture of the key biological process and the related downstream effects. In this case, the roles of various components in the UPS and the principle underlying E3–target recognition are well established. The second component is accurate annotation of the functional elements involved in the biological process, so that further analysis can focus on the well-annotated target regions that are presumably enriched in positively selected signals, thus boosting the signal-to-noise ratio for discovery. In this regard, machine learning often provides very powerful approaches to extend a small set of known cases to a much larger genome-wide set, as demonstrated here. Third, being able to quantify the functional consequences of a driver mutation affecting a biological process is crucial. In the case of the UPS, altered E3–target interactions would affect the protein level but not the mRNA level. Thus, the deviation in protein expression from the expected levels inferred from mRNA expression would be a good indicator of the functionality of a driver mutation. This aspect is key to increasing screening power and gaining related mechanistic insights for driver mutations through this analytical workflow. Multi-dimensional omics data available for the same sample cohorts (for example, TCGA and Cancer Cell Line Encyclopedia) provide a rich context for such information borrowing. Given the tremendous complexity of biological processes and the diversified roles of driver mutations, designing a single robust algorithm with both high sensitivity and specificity to detect cancer mutations in a general sense may be difficult. However, as exemplified in the present study, each biological process may be addressed individually through a focused approach. Thus, a more comprehensive list of cancer driver mutations can be obtained by using a ‘divide-and-conquer’ strategy.
Cheng, F., Liang, H., Butte, A. J., Eng, C. & Nussinov, R. Pharmacol. Rev. 71, 1–19 (2019).
Martincorena, I. & Campbell, P. J. Science 349, 1483–1489 (2015).
Bailey, M. H. et al. Cell 173, 371–385.e18 (2018).
Ng, P. K. et al. Cancer Cell 33, 450–462.e10 (2018).
Martínez-Jiménez, F., Muiños, F., López-Arribillaga, E., Lopez-Bigas, N. & Gonzalez-Perez, A. Nat. Cancer https://doi.org/10.1038/s43018-019-0001-2 (2019).
Laney, J. D. & Hochstrasser, M. Cell 97, 427–430 (1999).
Meszaros, B., Kumar, M., Gibson, T. J., Uyar, B. & Dosztanyi, Z. Sci. Signal. 10, eaak9982 (2017).
Li, J. et al. Nat. Methods 10, 1046–1047 (2013).
Ge, Z. et al. Cell Rep. 23, 213–226.e3 (2018).
Dinkel, H. et al. Nucleic Acids Res. 44, D294–D300 (2016).
The author declares no competing interests.
About this article
Cite this article
Liang, H. Finding cancer drivers in the UPS system. Nat Cancer 1, 20–21 (2020). https://doi.org/10.1038/s43018-019-0013-y