Computational approaches to identify functional genetic variants in cancer genomes

Journal name:
Nature Methods
Year published:
Published online


The International Cancer Genome Consortium (ICGC) aims to catalog genomic abnormalities in tumors from 50 different cancer types. Genome sequencing reveals hundreds to thousands of somatic mutations in each tumor but only a minority of these drive tumor progression. We present the result of discussions within the ICGC on how to address the challenge of identifying mutations that contribute to oncogenesis, tumor maintenance or response to therapy, and recommend computational techniques to annotate somatic variants and predict their impact on cancer phenotype.


  1. International Cancer Genome Consortium. et al. International network of cancer genome projects. Nature 464, 993998 (2010).
  2. Stratton, M.R., Campbell, P.J. & Futreal, P.A. The cancer genome. Nature 458, 719724 (2009).
  3. Hanahan, D. & Weinberg, R.A. The hallmarks of cancer. Cell 100, 5770 (2000).
  4. Hanahan, D. & Weinberg, R.A. Hallmarks of cancer: the next generation. Cell 144, 646674 (2011).
  5. Futreal, P.A. et al. A census of human cancer genes. Nat. Rev. Cancer 4, 177183 (2004).
  6. Malumbres, M. & Barbacid, M. RAS oncogenes: the first 30 years. Nat. Rev. Cancer 3, 459465 (2003).
  7. Davies, H. et al. Mutations of the BRAF gene in human cancer. Nature 417, 949954 (2002).
  8. McLaren, W. et al. Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics 26, 20692070 (2010).
  9. Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w118; iso-2; iso-3. Fly 6, 8092 (2012).
  10. Medina, I. et al. VARIANT: command line, web service and web interface for fast and accurate functional characterization of variants found by next-generation sequencing. Nucleic Acids Res. 40, W54W58 (2012).
  11. Hoehndorf, R., Kelso, J. & Herre, H. The ontology of biological sequences. BMC Bioinformatics 10, 377 (2009).
  12. Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 17601774 (2012).
  13. Flicek, P. et al. Ensembl 2013. Nucleic Acids Res. 41, D48D55 (2013).
  14. Karolchik, D., Hinrichs, A.S. & Kent, W.J. The UCSC Genome Browser. in Current Protocols in Bioinformatics (eds. Baxevanis, A.D. et al.) 1.4 (2012).
  15. Dunham, I. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 5774 (2012).
  16. Sherry, S.T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308311 (2001).
  17. 1000 Genomes Project Consortium. et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 5665 (2012).
  18. Forbes, S.A. et al. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res. 39, D945D950 (2011).
  19. Stenson, P.D. et al. The Human Gene Mutation Database: 2008 update. Genome Med. 1, 13 (2009).
  20. Amberger, J., Bocchini, C.A., Scott, A.F. & Hamosh, A. McKusick's Online Mendelian Inheritance in Man (OMIM). Nucleic Acids Res 37, D793D796 (2009).
  21. Kumar, P., Henikoff, S. & Ng, P.C. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat. Protoc. 4, 10731081 (2009).
  22. Ng, P.C. & Henikoff, S. SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 31, 38123814 (2003).
  23. González-Pérez, A. & López-Bigas, N. Improving the assessment of the outcome of nonsynonymous SNVs with a consensus deleteriousness score, condel. Am. J. Hum. Genet. 88, 440449 (2011).
  24. Reva, B., Antipin, Y. & Sander, C. Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic Acids Res. 39, e118 (2011).
  25. Ryan, M., Diekhans, M., Lien, S., Liu, Y. & Karchin, R. LS-SNP/PDB: annotated non-synonymous SNPs mapped to Protein Data Bank structures. Bioinformatics 25, 14311432 (2009).
  26. Stone, E.A. & Sidow, A. Physicochemical constraint violation by missense substitutions mediates impairment of protein function and disease severity. Genome Res. 15, 978986 (2005).
  27. Gonzalez-Perez, A., Deu-Pons, J. & Lopez-Bigas, N. Improving the prediction of the functional impact of cancer mutations by baseline tolerance transformation. Genome Med. 4, 89 (2012).
  28. Carter, H. et al. Cancer-specific high-throughput annotation of somatic mutations: computational prediction of driver missense mutations. Cancer Res. 69, 66606667 (2009).
  29. Kaminker, J.S., Zhang, Y., Watanabe, C. & Zhang, Z. CanPredict: a computational tool for predicting cancer-associated missense mutations. Nucleic Acids Res. 35, W595W598 (2007).
  30. Capriotti, E. & Altman, R.B. A new disease-specific machine learning approach for the prediction of cancer-causing missense variants. Genomics 98, 310317 (2011).
  31. Thusberg, J., Olatubosun, A. & Vihinen, M. Performance of mutation pathogenicity prediction methods on missense variants. Hum. Mutat. 32, 358368 (2011).
  32. Liu, X., Jian, X. & Boerwinkle, E. dbNSFP: a lightweight database of human non-synonymous SNPs and their functional predictions. Hum. Mutat. 32, 894899 (2011).
  33. Niknafs, N. et al. MuPIT Interactive: Webserver for mapping variant positions to annotated, interactive 3D structures. Hum. Genet. (in the press).
  34. Maerkl, S.J. & Quake, S.R. A systems approach to measuring the binding energy landscapes of transcription factors. Science 315, 233237 (2007).
  35. Badis, G. et al. Diversity and complexity in DNA recognition by transcription factors. Science 324, 17201723 (2009).
  36. Bailey, T.L. et al. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 37, W202W208 (2009).
  37. Boyle, A.P. et al. Annotation of functional variation in personal genomes using RegulomeDB. Genome Res. 22, 17901797 (2012).
  38. Bryne, J.C. et al. JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 update. Nucleic Acids Res. 36, D102D106 (2008).
  39. Clifford, R.J., Edmonson, M.N., Nguyen, C. & Buetow, K.H. Large-scale analysis of non-synonymous coding region single nucleotide polymorphisms. Bioinformatics 20, 10061014 (2004).
  40. Pleasance, E.D. et al. A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature 463, 184190 (2010).
  41. Hoffman, M.M. & Birney, E. An effective model for natural selection in promoters. Genome Res. 20, 685692 (2010).
  42. Cowper-Sal Lari, R. et al. Breast cancer risk-associated SNPs modulate the affinity of chromatin for FOXA1 and alter gene expression. Nat. Genet. 44, 11911198 (2012).
  43. Quesada, V. et al. Exome sequencing identifies recurrent mutations of the splicing factor SF3B1 gene in chronic lymphocytic leukemia. Nat. Genet. 44, 4752 (2011).
  44. Horn, S. et al. TERT promoter mutations in familial and sporadic melanoma. Science 339, 959961 (2013).
  45. Huang, F.W. et al. Highly recurrent TERT promoter mutations in human melanoma. Science 339, 957959 (2013).
  46. Pleasance, E.D. et al. A comprehensive catalogue of somatic mutations from a human cancer genome. Nature 463, 191196 (2010).
  47. Lohr, J.G. et al. Discovery and prioritization of somatic mutations in diffuse large B-cell lymphoma (DLBCL) by whole-exome sequencing. Proc. Natl. Acad. Sci. USA 109, 38793884 (2012).
  48. Stamatoyannopoulos, J.A. et al. Human mutation rate associated with DNA replication timing. Nat. Genet. 41, 393395 (2009).
  49. Greenman, C. et al. Patterns of somatic mutation in human cancer genomes. Nature 446, 153158 (2007).
  50. Hodis, E. et al. A landscape of driver mutations in melanoma. Cell 150, 251263 (2012).
  51. Dees, N.D. et al. MuSiC: identifying mutational significance in cancer genomes. Genome Res. 22, 15891598 (2012).
  52. Lawrence, M.S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature advance online publication, doi:10.1038/nature12213 (16 June 2013).
  53. Gonzalez-Perez, A. & Lopez-Bigas, N. Functional impact bias reveals cancer drivers. Nucleic Acids Res. 40, e169 (2012).
  54. Reimand, J. & Bader, G.D. Systematic analysis of somatic mutations in phosphorylation signaling predicts novel cancer drivers. Mol. Syst. Biol. 9, 637 (2013).
  55. Sjöblom, T. et al. The consensus coding sequences of human breast and colorectal cancers. Science 314, 268274 (2006).
  56. Creixell, P., Schoof, E.M., Erler, J.T. & Linding, R. Navigating cancer network attractors for tumor-specific therapy. Nat. Biotechnol. 30, 842848 (2012).
  57. Douville, C. et al. CRAVAT: Cancer-Related Analysis of VAriants Toolit. Bioinformatics 29, 647648 (2013).
  58. Carter, H. et al. Identifying Mendelian disease genes with the Variant Effect Scoring Tool. BMC Genomics 14 (suppl. 3), S3 (2013).
  59. Gundem, G. et al. IntOGen: integration and data-mining of multidimensional oncogenomic data. Nat. Methods 7, 9293 (2010).
  60. Adzhubei, I.A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248249 (2010).
  61. Masica, D.L. & Karchin, R. Correlation of somatic mutation and expression identifies genes important in human glioblastoma progression and survival. Cancer Res. 71, 45504561 (2011).
  62. Lee, W., Zhang, Y., Mukhyala, K., Lazarus, R.A. & Zhang, Z. Bi-directional SIFT predicts a subset of activating mutations. PLoS ONE 4, e8311 (2009).
  63. Ng, S. et al. PARADIGM-SHIFT predicts the function of mutations in multiple cancers using pathway impact analysis. Bioinformatics 28, i640i646 (2012).
  64. Iyer, G. et al. Genome sequencing identifies a basis for everolimus sensitivity. Science 338, 221 (2012).
  65. Valencia, A. & Hidalgo, M. Getting personalized cancer genome analysis into the clinic: the challenges in bioinformatics. Genome Med. 4, 61 (2012).

Download references

Author information

  1. These authors contributed equally to this work.

    • Abel Gonzalez-Perez,
    • Ville Mustonen,
    • Boris Reva &
    • Graham R S Ritchie


  1. Research Unit on Biomedical Informatics, University Pompeu Fabra, Barcelona, Spain.

    • Abel Gonzalez-Perez &
    • Nuria Lopez-Bigas
  2. Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    • Ville Mustonen,
    • Graham R S Ritchie,
    • Adam Butler &
    • Serge Dronov
  3. Computational Biology Center, Memorial Sloan-Kettering Cancer Center, New York, New York, USA.

    • Boris Reva &
    • Chris Sander
  4. Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    • Graham R S Ritchie &
    • Paul Flicek
  5. Cellular Signal Integration Group, Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark.

    • Pau Creixell &
    • Rune Linding
  6. Department of Biomedical Engineering and Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland, USA.

    • Rachel Karchin &
    • Hannah Carter
  7. Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre, Madrid, Spain.

    • Miguel Vazquez &
    • Alfonso Valencia
  8. Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St. Lucia, Brisbane, Queensland, Australia.

    • J Lynn Fink,
    • Karin S Kassahn &
    • John V Pearson
  9. The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada.

    • Gary D Bader,
    • Jüri Reimand &
    • Lincoln D Stein
  10. Ontario Institute for Cancer Research, Toronto, Ontario, Canada.

    • Paul C Boutros,
    • Lakshmi Muthuswamy &
    • B F Francis Ouellette
  11. Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada.

    • Paul C Boutros &
    • Lakshmi Muthuswamy
  12. Department of Pharmacology and Toxicology, University of Toronto, Toronto, Ontario, Canada.

    • Paul C Boutros
  13. Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, Canada.

    • B F Francis Ouellette
  14. Division of Cancer Genomics, National Cancer Center, Chuo-ku, Tokyo, Japan.

    • Tatsuhiro Shibata
  15. Spanish National Bioinformatics Institute, Madrid, Spain.

    • Alfonso Valencia
  16. Cambridge Research Institute, Cambridge, UK.

    • Nick B Shannon
  17. The Genome Institute, Washington University School of Medicine, St. Louis, Missouri, USA.

    • Li Ding
  18. Department of Medicine, Division of Oncology, Washington University School of Medicine, St. Louis, Missouri, USA.

    • Li Ding
  19. Department of Biomolecular Engineering, University of California, Santa Cruz, California, USA.

    • Josh M Stuart
  20. Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California, USA.

    • Josh M Stuart
  21. Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada.

    • Lincoln D Stein
  22. Institució Catalana de Recerca i Estudis Avançats, Barcelona, Spain.

    • Nuria Lopez-Bigas


  1. the International Cancer Genome Consortium Mutation Pathways and Consequences Subgroup of the Bioinformatics Analyses Working Group

    • Abel Gonzalez-Perez,
    • Ville Mustonen,
    • Boris Reva,
    • Graham R S Ritchie,
    • Pau Creixell,
    • Rachel Karchin,
    • Miguel Vazquez,
    • J Lynn Fink,
    • Karin S Kassahn,
    • John V Pearson,
    • Gary D Bader,
    • Paul C Boutros,
    • Lakshmi Muthuswamy,
    • B F Francis Ouellette,
    • Jüri Reimand,
    • Rune Linding,
    • Tatsuhiro Shibata,
    • Alfonso Valencia,
    • Adam Butler,
    • Serge Dronov,
    • Paul Flicek,
    • Nick B Shannon,
    • Hannah Carter,
    • Li Ding,
    • Chris Sander,
    • Josh M Stuart,
    • Lincoln D Stein &
    • Nuria Lopez-Bigas

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (129.3 KB)

    Supplementary Tables 2–4

Excel files

  1. Supplementary Table 1 (11.2 KB)

    Sequence Ontology (SO) terms used to describe the effect of mutations.

Additional data