Comparison of algorithms for the detection of cancer drivers at subgene resolution

Porta-Pardo, Eduard; Kamburov, Atanas; Tamborero, David; Pons, Tirso; Grases, Daniela; Valencia, Alfonso; Lopez-Bigas, Nuria; Getz, Gad; Godzik, Adam

doi:10.1038/nmeth.4364

Analysis
Published: 17 July 2017

Comparison of algorithms for the detection of cancer drivers at subgene resolution

Nature Methods volume 14, pages 782–788 (2017)Cite this article

7904 Accesses
58 Citations
102 Altmetric
Metrics details

Subjects

Abstract

Understanding genetic events that lead to cancer initiation and progression remains one of the biggest challenges in cancer biology. Traditionally, most algorithms for cancer-driver identification look for genes that have more mutations than expected from the average background mutation rate. However, there is now a wide variety of methods that look for nonrandom distribution of mutations within proteins as a signal for the driving role of mutations in cancer. Here we classify and review such subgene-resolution algorithms, compare their findings on four distinct cancer data sets from The Cancer Genome Atlas and discuss how predictions from these algorithms can be interpreted in the emerging paradigms that challenge the simple dichotomy between driver and passenger genes.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Finding mutation drivers across biological scales.**

**Figure 2: Comparison of the overall predictions of each method.**

**Figure 3: Evaluating the predictions of each method and type of algorithm based on CGC data.**

**Figure 4: Using mutation clusters to improve the definition of cancer drivers.**

Combined burden and functional impact tests for cancer driver discovery using DriverPower

Article Open access 05 February 2020

Identification of cancer driver genes based on nucleotide context

Article 03 February 2020

MutSpot: detection of non-coding mutation hotspots in cancer genomes

Article Open access 05 June 2020

Accession codes

Accessions

Protein Data Bank

3NJP

References

Cancer Genome Atlas Research Network. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).
Gerlinger, M. et al. Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N. Engl. J. Med. 366, 883–892 (2012).
CAS PubMed PubMed Central Google Scholar
Watson, I.R., Takahashi, K., Futreal, P.A. & Chin, L. Emerging patterns of somatic mutations in cancer. Nat. Rev. Genet. 14, 703–718 (2013).
CAS PubMed PubMed Central Google Scholar
Ortmann, C.A. et al. Effect of mutation order on myeloproliferative neoplasms. N. Engl. J. Med. 372, 601–612 (2015).
PubMed PubMed Central Google Scholar
Lawrence, M.S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).
CAS PubMed PubMed Central Google Scholar
Niu, B. et al. Protein-structure-guided discovery of functional mutations across 19 cancer types. Nat. Genet. 48, 827–837 (2016).
CAS PubMed PubMed Central Google Scholar
Leiserson, M.D. et al. Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes. Nat. Genet. 47, 106–114 (2015).
CAS PubMed Google Scholar
Zhong, Q. et al. Edgetic perturbation models of human inherited disorders. Mol. Syst. Biol. 5, 321 (2009).
PubMed PubMed Central Google Scholar
Ding, L., Wendl, M.C., McMichael, J.F. & Raphael, B.J. Expanding the computational toolbox for mining cancer genomes. Nat. Rev. Genet. 15, 556–570 (2014).
CAS PubMed PubMed Central Google Scholar
Gonzalez-Perez, A. et al. Computational approaches to identify functional genetic variants in cancer genomes. Nat. Methods 10, 723–729 (2013).
CAS PubMed PubMed Central Google Scholar
Leedham, S. & Tomlinson, I. The continuum model of selection in human tumors: general paradigm or niche product? Cancer Res. 72, 3131–3134 (2012).
CAS PubMed Google Scholar
Nussinov, R. & Tsai, C.J. 'Latent drivers' expand the cancer mutational landscape. Curr. Opin. Struct. Biol. 32, 25–32 (2015).
CAS PubMed Google Scholar
Castro-Giner, F., Ratcliffe, P. & Tomlinson, I. The mini-driver model of polygenic cancer evolution. Nat. Rev. Cancer 15, 680–685 (2015).
CAS PubMed Google Scholar
Tamborero, D., Gonzalez-Perez, A. & Lopez-Bigas, N. OncodriveCLUST: exploiting the positional clustering of somatic mutations to identify cancer genes. Bioinformatics 29, 2238–2244 (2013).
CAS PubMed Google Scholar
Ye, J., Pavlicek, A., Lunney, E.A., Rejto, P.A. & Teng, C.H. Statistical method on nonrandom clustering with application to somatic mutations in cancer. BMC Bioinformatics 11, 11 (2010).
PubMed PubMed Central Google Scholar
Kamburov, A. et al. Comprehensive assessment of cancer missense mutation clustering in protein structures. Proc. Natl. Acad. Sci. USA 112, E5486–E5495 (2015).
CAS PubMed PubMed Central Google Scholar
Tokheim, C. et al. Exome-scale discovery of hotspot mutation regions in human cancer using 3D protein structure. Cancer Res. 76, 3719–3731 (2016).
CAS PubMed PubMed Central Google Scholar
Porta-Pardo, E. & Godzik, A. e-Driver: a novel method to identify protein regions driving cancer. Bioinformatics 30, 3109–3114 (2014).
CAS PubMed PubMed Central Google Scholar
Melloni, G.E. et al. LowMACA: exploiting protein family analysis for the identification of rare driver mutations in cancer. BMC Bioinformatics 17, 80 (2016).
PubMed PubMed Central Google Scholar
Reimand, J. & Bader, G.D. Systematic analysis of somatic mutations in phosphorylation signaling predicts novel cancer drivers. Mol. Syst. Biol. 9, 637 (2013).
PubMed PubMed Central Google Scholar
Porta-Pardo, E., Garcia-Alonso, L., Hrabe, T., Dopazo, J. & Godzik, A. A pan-cancer catalogue of cancer driver protein interaction interfaces. PLoS Comput. Biol. 11, e1004518 (2015).
PubMed PubMed Central Google Scholar
Mészáros, B., Zeke, A., Reményi, A., Simon, I. & Dosztányi, Z. Systematic analysis of somatic mutations driving cancer: uncovering functional protein regions in disease development. Biol. Direct 11, 23 (2016).
PubMed PubMed Central Google Scholar
Jia, P. et al. MSEA: detection and quantification of mutation hotspots through mutation set enrichment analysis. Genome Biol. 15, 489 (2014).
PubMed PubMed Central Google Scholar
Van den Eynden, J., Fierro, A.C., Verbeke, L.P. & Marchal, K. SomInaClust: detection of cancer genes based on somatic mutation patterns of inactivation and clustering. BMC Bioinformatics 16, 125 (2015).
PubMed PubMed Central Google Scholar
Araya, C.L. et al. Identification of significantly mutated regions across cancer types highlights a rich landscape of functional molecular alterations. Nat. Genet. 48, 117–125 (2016).
CAS PubMed Google Scholar
Lawrence, M.S. et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505, 495–501 (2014).
CAS PubMed PubMed Central Google Scholar
Poole, W., Leinonen, K., Shmulevich, I., Knijnenburg, T.A. & Bernard, B. Multiscale mutation clustering algorithm identifies pan-cancer mutational clusters associated with pathway-level changes in gene expression. PLoS Comput. Biol. 13, e1005347 (2017).
PubMed PubMed Central Google Scholar
Porta-Pardo, E., Hrabe, T. & Godzik, A. Cancer3D: understanding cancer mutations through protein structures. Nucleic Acids Res. 43, D968–D973 (2015).
CAS PubMed Google Scholar
Ryslik, G.A., Cheng, Y., Cheung, K.H., Modis, Y. & Zhao, H. Utilizing protein structure to identify nonrandom somatic mutations. BMC Bioinformatics 14, 190 (2013).
PubMed PubMed Central Google Scholar
Ryslik, G.A., Cheng, Y., Cheung, K.H., Modis, Y. & Zhao, H. A graph theoretic approach to utilizing protein structure to identify nonrandom somatic mutations. BMC Bioinformatics 15, 86 (2014).
PubMed PubMed Central Google Scholar
Gao, J. et al. 3D clusters of somatic mutations in cancer reveal numerous rare mutations as functional targets. Genome Med. 9, 4 (2017).
PubMed PubMed Central Google Scholar
Ryslik, G.A. et al. A spatial simulation approach to account for protein structure when identifying nonrandom somatic mutations. BMC Bioinformatics 15, 231 (2014).
PubMed PubMed Central Google Scholar
Miller, M.L. et al. Pan-cancer analysis of mutation hotspots in protein domains. Cell Syst. 1, 197–209 (2015).
CAS PubMed PubMed Central Google Scholar
Gonzalez-Perez, A. & Lopez-Bigas, N. Functional impact bias reveals cancer drivers. Nucleic Acids Res. 40, e169 (2012).
CAS PubMed PubMed Central Google Scholar
Chang, M.T. et al. Identifying recurrent mutations in cancer reveals widespread lineage diversity and mutational specificity. Nat. Biotechnol. 34, 155–163 (2016).
CAS PubMed Google Scholar
Futreal, P.A. et al. A census of human cancer genes. Nat. Rev. Cancer 4, 177–183 (2004).
CAS PubMed PubMed Central Google Scholar
Seldin, D.C. et al. CK2 as a positive regulator of Wnt signalling and tumourigenesis. Mol. Cell. Biochem. 274, 63–67 (2005).
CAS PubMed Google Scholar
Ahmad, K.A., Wang, G., Unger, G., Slaton, J. & Ahmed, K. Protein kinase CK2—a key suppressor of apoptosis. Adv. Enzyme Regul. 48, 179–187 (2008).
CAS PubMed PubMed Central Google Scholar
Ikeda, Y. et al. Germline PARP4 mutations in patients with primary thyroid and breast cancers. Endocr. Relat. Cancer 23, 171–179 (2016).
CAS PubMed Google Scholar
Brenan, L. et al. Phenotypic characterization of a comprehensive set of MAPK1/ERK2 missense mutants. Cell Rep. 17, 1171–1183 (2016).
CAS PubMed PubMed Central Google Scholar
Sim, N.L. et al. SIFT web server: predicting effects of amino acid substitutions on proteins. Nucleic Acids Res. 40, W452–W457 (2012).
CAS PubMed PubMed Central Google Scholar
Creixell, P. et al. Kinome-wide decoding of network-attacking mutations rewiring cancer signaling. Cell 163, 202–217 (2015).
CAS PubMed PubMed Central Google Scholar
Mosca, R. et al. dSysMap: exploring the edgetic role of disease mutations. Nat. Methods 12, 167–168 (2015).
CAS PubMed Google Scholar
Vázquez, M., Valencia, A. & Pons, T. Structure-PPi: a module for the annotation of cancer-related single-nucleotide variants at protein-protein interfaces. Bioinformatics 31, 2397–2399 (2015).
PubMed PubMed Central Google Scholar
Puente, X.S. et al. Non-coding recurrent mutations in chronic lymphocytic leukaemia. Nature 526, 519–524 (2015).
CAS PubMed Google Scholar
Brennan, C.W. et al. The somatic genomic landscape of glioblastoma. Cell 155, 462–477 (2013).
CAS PubMed PubMed Central Google Scholar
Koboldt, D.C. et al. Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012).
CAS Google Scholar
Weinstein, J.N. et al. Cancer Genome Atlas Research Network. Comprehensive molecular characterization of urothelial bladder carcinoma. Nature 507, 315–322 (2014).
CAS Google Scholar
Cancer Genome Atlas Research Network. Comprehensive molecular profiling of lung adenocarcinoma. Nature 511, 543–550 (2014).
Gonzalez-Perez, A. et al. IntOGen-mutations identifies cancer drivers across tumor types. Nat. Methods 10, 1081–1082 (2013).
CAS PubMed PubMed Central Google Scholar
Hrabe, T. et al. PDBFlex: exploring flexibility in protein structures. Nucleic Acids Res. 44 D1, D423–D428 (2016).
CAS PubMed Google Scholar
Tamborero, D. et al. Comprehensive identification of mutational cancer driver genes across 12 tumor types. Sci. Rep. 3, 2650 (2013).
PubMed PubMed Central Google Scholar
Goldman, M. et al. The UCSC Cancer Genomics Browser: update 2015. Nucleic Acids Res. 43, D812–D817 (2015).
CAS PubMed Google Scholar

Download references

Acknowledgements

We would like to thank the people working at The Cancer Genome Atlas for their efforts and for making all the data publicly available. E.P.-P. and A.G. acknowledge the support from the Cancer Center grants P30 CA030199 (to our institute) and R35 GM118187 (A.G.). A.K. was supported by startup funds of G.G. and by a collaboration with Bayer AG. D.T. is supported by project SAF2015-74072-JIN, which is funded by the Agencia Estatal de Investigacion (AEI) and Fondo Europeo de Desarrollo Regional (FEDER). N.L.-B. acknowledges funding from the European Research Council (consolidator grant 682398). A.V. and T.P. acknowledge funding by the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 305444 (RD-Connect).

Author information

Eduard Porta-Pardo
Present address: Barcelona Supercomputing Centre (BSC), Barcelona, Spain
Tirso Pons
Present address: Stem cells and Immunity Laboratory, National Centre for Biotechnology (CNB-CSIC), Madrid, Spain

Authors and Affiliations

Sanford Burnham Prebys Medical Discovery Institute, La Jolla, California, USA
Eduard Porta-Pardo, Daniela Grases & Adam Godzik
Department of Pathology and Cancer Center, Massachusetts General Hospital, Boston, Massachusetts, USA
Atanas Kamburov & Gad Getz
Harvard Medical School, Boston, Massachusetts, USA
Atanas Kamburov & Gad Getz
Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
Atanas Kamburov & Gad Getz
Department of Experimental and Health Sciences, University Pompeu Fabra (UPF), Barcelona, Spain
David Tamborero & Nuria Lopez-Bigas
Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain
David Tamborero & Nuria Lopez-Bigas
Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
Tirso Pons
Barcelona Supercomputing Centre (BSC), Barcelona, Spain
Alfonso Valencia
Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain
Alfonso Valencia & Nuria Lopez-Bigas

Authors

Eduard Porta-Pardo
View author publications
You can also search for this author in PubMed Google Scholar
Atanas Kamburov
View author publications
You can also search for this author in PubMed Google Scholar
David Tamborero
View author publications
You can also search for this author in PubMed Google Scholar
Tirso Pons
View author publications
You can also search for this author in PubMed Google Scholar
Daniela Grases
View author publications
You can also search for this author in PubMed Google Scholar
Alfonso Valencia
View author publications
You can also search for this author in PubMed Google Scholar
Nuria Lopez-Bigas
View author publications
You can also search for this author in PubMed Google Scholar
Gad Getz
View author publications
You can also search for this author in PubMed Google Scholar
Adam Godzik
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

E.P.-P. and A.G. conceived the project. E.P.-P., D.T. and T.P. researched the data for the article. E.P.-P., A.K. and D.T. analyzed the data. All authors were involved in writing the article and reviewed and edited the manuscript before submission.

Corresponding author

Correspondence to Adam Godzik.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Coverage of the human proteome by different types of biological features.

Fraction of the proteome that is covered by linear regions (left), structures with over 95% sequence identity between the protein and the template (middle) or structures with a BLAST e-value between the protein and the template below 1e-9. The fraction is calculated for both, the absolute number of proteins (left columns) as well as the total number of residues (right columns). The distinction between the two is important because it is usually the case that we only know the structure for a fraction of the protein.

Supplementary Figure 2 Results of the different algorithms in the BLCA dataset.

Visualization is limited to genes detected by at least 4 methods or known drivers in BLCA detected by at least one algorithm.

Supplementary Figure 3 Results of the different algorithms in the BRCA dataset.

Visualization is limited to genes detected by at least 4 methods or known drivers in BRCA detected by at least one algorithm.

Supplementary Figure 4 Results of the different algorithms in the LUAD dataset.

Visualization is limited to genes detected by at least 4 methods or known drivers in LUAD detected by at least one algorithm.

Supplementary Figure 5 Sub-gene resolution algorithms detect more oncogenes than tumor-suppressors.

(a) Barplot showing the fraction of genes detected by each method that are oncogenes, tumor-suppressors, have a dual-role or whose mode of action is not yet known. (b) Fold-enrichment of each method in detected oncogenes or genes with dual-role when aggregating all four datasets.

Supplementary Figure 6 Description of the datasets.

(a) Mutation types in each patient of the different dataset. The majority of mutations are missense. (b) Number of patients (top) and violin plot showing the distribution of number of mutations (bottom) in each dataset. Each dot represents a sample.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–6 (PDF 1200 kb)

Supplementary Table 1

Availability and statistical tests used by each method. (XLSX 13 kb)

Supplementary Table 2

Performance of the different methods in the BLCA dataset. (XLSX 10 kb)

Supplementary Table 3

Performance of the different methods on the BRCA dataset. (XLSX 10 kb)

Supplementary Table 4

Performance of the different methods on the GBM dataset. (XLSX 10 kb)

Supplementary Table 5

Performance of the different methods on the LUAD dataset. (XLSX 11 kb)

Supplementary Table 6

Driver genes not detected by whole-gene methods. (XLSX 26 kb)

Supplementary Table 7

Candidate novel driver genes detected only by sub-gene resolution algorithms. (XLSX 17 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Porta-Pardo, E., Kamburov, A., Tamborero, D. et al. Comparison of algorithms for the detection of cancer drivers at subgene resolution. Nat Methods 14, 782–788 (2017). https://doi.org/10.1038/nmeth.4364

Download citation

Received: 18 January 2017
Accepted: 16 June 2017
Published: 17 July 2017
Issue Date: 01 August 2017
DOI: https://doi.org/10.1038/nmeth.4364

This article is cited by

Comparative assessment of genes driving cancer and somatic evolution in non-cancer tissues: an update of the Network of Cancer Genes (NCG) resource
- Lisa Dressler
- Michele Bortolomeazzi
- Francesca D. Ciccarelli
Genome Biology (2022)
Identification of important genes and drug repurposing based on clinical-centered analysis across human cancers
- Ying Li
- Ya-ping Dong
- Hong-yang Wang
Acta Pharmacologica Sinica (2021)
Personalized oncology and BRAFK601N melanoma: model development, drug discovery, and clinical correlation
- Brian A. Keller
- Brian J. Laight
- John C. Bell
Journal of Cancer Research and Clinical Oncology (2021)
OncoOmics approaches to reveal essential genes in breast cancer: a panoramic view from pathogenesis to precision medicine
- Andrés López-Cortés
- César Paz-y-Miño
- Eduardo Tejera
Scientific Reports (2020)
Interpreting pathways to discover cancer driver genes with Moonlight
- Antonio Colaprico
- Catharina Olsen
- Elena Papaleo
Nature Communications (2020)