Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Expanding the computational toolbox for mining cancer genomes

Key Points

  • High-throughput sequencing of cancer genomes, exomes and transcriptomes has enabled the identification of many novel somatic aberrations, and has provided new insights into cancer biology and new therapeutic targets.

  • Computational and statistical tools are necessary for interpreting the large and complex data sets that result from high-throughput sequencing approaches.

  • Mature software for detecting single-nucleotide variants, insertions and deletions, copy-number aberrations, structural variants and gene fusions in cancer genomes are now available. Additional challenges remain in increasing the sensitivity and specificity of these algorithms.

  • Computational techniques are essential for assigning priority to somatic aberrations that are likely to be functional for further experimental validation. Two common approaches are to predict functional impact of individual mutations using prior biological knowledge and to identify recurrently mutated genes, pathways and networks across many samples.

  • Algorithms to infer the clonal structure and evolutionary history of a tumour from ultra-deep sequencing data have recently been introduced. Applications of these techniques have shown that minority mutations in primary tumours may increase to majority in relapse or metastasis.

  • Sequencing of cancer genomes has shown a wide range of specialized mutational processes, including kataegis, chromothripsis and chromoplexy that result in rapid genomic changes and punctuated tumour evolution.

Abstract

High-throughput DNA sequencing has revolutionized the study of cancer genomics with numerous discoveries that are relevant to cancer diagnosis and treatment. The latest sequencing and analysis methods have successfully identified somatic alterations, including single-nucleotide variants, insertions and deletions, copy-number aberrations, structural variants and gene fusions. Additional computational techniques have proved useful for defining the mutations, genes and molecular networks that drive diverse cancer phenotypes and that determine clonal architectures in tumour samples. Collectively, these tools have advanced the study of genomic, transcriptomic and epigenomic alterations in cancer, and their association to clinical properties. Here, we review cancer genomics software and the insights that have been gained from their application.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Sample procurement, sequencing and analysis roadmap.
Figure 2: Biological factors relevant to assessing significantly mutated genes in cancer.
Figure 3: Significantly mutated genes, pathways and networks.
Figure 4: A conceptual example of clonal evolution model and clonality analyses.

Similar content being viewed by others

References

  1. Sanger, F., Nicklen, S. & Coulson, A. R. DNA sequencing with chain-terminating inhibitors. Proc. Natl Acad. Sci. USA 74, 5463–5467 (1977).

    CAS  PubMed  Google Scholar 

  2. Sanger, F., Nicklen, S. & Coulson, A. R. DNA sequencing with chain-terminating inhibitors. 1977. Biotechnology 24, 104–108 (1992).

    CAS  PubMed  Google Scholar 

  3. Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).

    CAS  PubMed  Google Scholar 

  4. Ley, T. J. et al. DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature 456, 66–72 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  5. Shendure, J. & Lieberman Aiden, E. The expanding scope of DNA sequencing. Nature Biotech. 30, 1084–1094 (2012).

    CAS  Google Scholar 

  6. Majewski, J., Schwartzentruber, J., Lalonde, E., Montpetit, A. & Jabado, N. What can exome sequencing do for you? J. Med. Genet. 48, 580–589 (2011).

    CAS  PubMed  Google Scholar 

  7. Ozsolak, F. & Milos, P. M. RNA sequencing: advances, challenges and opportunities. Nature Rev. Genet. 12, 87–98 (2011).

    CAS  PubMed  Google Scholar 

  8. Krueger, F., Kreck, B., Franke, A. & Andrews, S. R. DNA methylome analysis using short bisulfite sequencing data. Nature Methods 9, 145–151 (2012).

    CAS  PubMed  Google Scholar 

  9. Ding, L. et al. Genome remodelling in a basal-like breast cancer metastasis and xenograft. Nature 464, 999–1005 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  10. Nowell, P. C. The clonal evolution of tumor cell populations. Science 194, 23–28 (1976).

    CAS  PubMed  Google Scholar 

  11. Ding, L. et al. Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing. Nature 481, 506–510 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  12. Gerlinger, M. et al. Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N. Engl. J. Med. 366, 883–892 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  13. Navin, N. et al. Inferring tumor progression from genomic heterogeneity. Genome Res. 20, 68–80 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  14. Navin, N. E. & Hicks, J. Tracing the tumor lineage. Mol. Oncol. 4, 267–283 (2010).

    PubMed  PubMed Central  Google Scholar 

  15. Navin, N. et al. Tumour evolution inferred by single-cell sequencing. Nature 472, 90–94 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  16. Hou, Y. et al. Single-cell exome sequencing and monoclonal evolution of a JAK2-negative myeloproliferative neoplasm. Cell 148, 873–885 (2012).

    CAS  PubMed  Google Scholar 

  17. Xu, X. et al. Single-cell exome sequencing reveals single-nucleotide mutation characteristics of a kidney tumor. Cell 148, 886–895 (2012).

    CAS  PubMed  Google Scholar 

  18. Gundry, M., Li, W., Maqbool, S. B. & Vijg, J. Direct, genome-wide assessment of DNA mutations in single cells. Nucleic Acids Res. 40, 2032–2040 (2012).

    CAS  PubMed  Google Scholar 

  19. Baslan, T. et al. Genome-wide copy number analysis of single cells. Nature Protoc. 7, 1024–1041 (2012).

    CAS  Google Scholar 

  20. Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. Kim, S. Y. & Speed, T. P. Comparing somatic mutation-callers: beyond Venn diagrams. BMC Bioinformatics 14, 189 (2013).

    PubMed  PubMed Central  Google Scholar 

  22. Goode, D. L. et al. A simple consensus approach improves somatic mutation prediction accuracy. Genome Med. 5, 90 (2013).

    PubMed  PubMed Central  Google Scholar 

  23. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010). The GATK is a broad and widely used toolkit for variant discovery and data processing.

    CAS  PubMed  PubMed Central  Google Scholar 

  24. Koboldt, D. C. et al. VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics 25, 2283–2285 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. Koboldt, D. C. et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568–576 (2012). VarScan (described in references 24 and 25) is one of the early programs for somatic SNV detection and has since added additional capability for germline, copy-number and indel events.

    CAS  PubMed  PubMed Central  Google Scholar 

  26. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009). SAMtools is a broad set of utilities for processing sequence data in the standardized SAM/BAM format, including variant calling.

    PubMed  PubMed Central  Google Scholar 

  27. Larson, D. E. et al. SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics 28, 311–317 (2012).

    CAS  PubMed  Google Scholar 

  28. Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nature Biotech. 31, 213–219 (2013). MuTect is a widely used program for identifying somatic SNVs in tumour–normal pair sequencing data.

    CAS  Google Scholar 

  29. Saunders, C. T. et al. Strelka: accurate somatic small-variant calling from sequenced tumor–normal sample pairs. Bioinformatics 28, 1811–1817 (2012).

    CAS  PubMed  Google Scholar 

  30. Goya, R. et al. SNVMix: predicting single nucleotide variants from next-generation sequencing of tumors. Bioinformatics 26, 730–736 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. Roth, A. et al. JointSNVMix: a probabilistic model for accurate detection of somatic mutations in normal/tumour paired next-generation sequencing data. Bioinformatics 28, 907–913 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. Lunter, G. Probabilistic whole-genome alignments reveal high indel rates in the human and mouse genomes. Bioinformatics 23, i289–i296 (2007).

    CAS  PubMed  Google Scholar 

  33. Cartwright, R. A. Problems and solutions for estimating indel rates and length distributions. Mol. Biol. Evol. 26, 473–480 (2009).

    CAS  PubMed  Google Scholar 

  34. Li, H., Ruan, J. & Durbin, R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18, 1851–1858 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  35. Smith, C. C. et al. Validation of ITD mutations in FLT3 as a therapeutic target in human acute myeloid leukaemia. Nature 485, 260–263 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. Spencer, D. H. et al. Detection of FLT3 internal tandem duplication in targeted, short-read-length, next-generation sequencing data. J. Mol. Diagn. 15, 81–93 (2013).

    CAS  PubMed  Google Scholar 

  37. Albers, C. A. et al. Dindel: accurate indel calls from short-read data. Genome Res. 21, 961–973 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  38. Ye, K., Schulz, M. H., Long, Q., Apweiler, R. & Ning, Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 25, 2865–2871 (2009). Pindel is focused on identifying breakpoints at single-base-resolution of indels, inversions and tandem duplications.

    CAS  PubMed  PubMed Central  Google Scholar 

  39. Ye, K., Kosters, W. A. & Ijzerman, A. P. An efficient, versatile and scalable pattern growth approach to mine frequent patterns in unaligned protein sequences. Bioinformatics 23, 687–693 (2007).

    CAS  PubMed  Google Scholar 

  40. Rausch, T. et al. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28, i333–i339 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  41. Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv 1303.3997 [q-bio. GN] (2013).

  42. Chen, K. et al. TIGRA: a targeted iterative graph routing assembler for breakpoint assembly. Genome Res. 24, 310–317 (2014).

    PubMed  PubMed Central  Google Scholar 

  43. Bignell, G. R. et al. Signatures of mutation and selection in the cancer genome. Nature 463, 893–898 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  44. Beroukhim, R. et al. The landscape of somatic copy-number alteration across human cancers. Nature 463, 899–905 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  45. Yoon, S., Xuan, Z., Makarov, V., Ye, K. & Sebat, J. Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Res. 19, 1586–1592 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  46. Campbell, P. J. et al. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nature Genet. 40, 722–729 (2008).

    CAS  PubMed  Google Scholar 

  47. Beroukhim, R. et al. Assessing the significance of chromosomal aberrations in cancer: methodology and application to glioma. Proc. Natl Acad. Sci. USA 104, 20007–20012 (2007). GISTIC is one of the standard tools for finding genes that are affected by CNAs which have a bearing on cancer initiation or progression.

    CAS  PubMed  Google Scholar 

  48. Zhang, Q. et al. CMDS: a population-based method for identifying recurrent DNA copy number aberrations in cancer from high-resolution data. Bioinformatics 26, 464–469 (2010).

    CAS  PubMed  Google Scholar 

  49. Raphael, B. J., Volik, S., Collins, C. & Pevzner, P. A. Reconstructing tumor genome architectures. Bioinformatics 19 (Suppl. 2), ii162–ii171 (2003).

    PubMed  Google Scholar 

  50. Raphael, B. J. et al. A sequence-based survey of the complex structural organization of tumor genomes. Genome Biol. 9, R59 (2008).

    PubMed  PubMed Central  Google Scholar 

  51. Volik, S. et al. Decoding the fine-scale structure of a breast cancer genome and transcriptome. Genome Res. 16, 394–404 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  52. Volik, S. et al. End-sequence profiling: sequence-based analysis of aberrant genomes. Proc. Natl Acad. Sci. USA 100, 7696–7701 (2003).

    PubMed  Google Scholar 

  53. Bignell, G. R. et al. Architectures of somatic genomic rearrangement in human cancer amplicons at sequence-level resolution. Genome Res. 17, 1296–1303 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  54. Chen, K. et al. BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nature Methods 6, 677–681 (2009). BreakDancer is a general tool for identifying structural variations (including insertions, deletions, inversions and translocations) using the concept of discordant read pairs.

    CAS  PubMed  PubMed Central  Google Scholar 

  55. Wang, J. et al. CREST maps somatic structural variation in cancer genomes with base-pair resolution. Nature Methods 8, 652–654 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  56. Hormozdiari, F., Alkan, C., Eichler, E. E. & Sahinalp, S. C. Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes. Genome Res. 19, 1270–1278 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  57. Sindi, S., Helman, E., Bashir, A. & Raphael, B. J. A geometric approach for classification and comparison of structural variants. Bioinformatics 25, i222–i230 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  58. Sindi, S. S., Onal, S., Peng, L. C., Wu, H. T. & Raphael, B. J. An integrative probabilistic model for identification of structural variation in sequencing data. Genome Biol. 13, R22 (2012).

    PubMed  PubMed Central  Google Scholar 

  59. Handsaker, R. E., Korn, J. M., Nemesh, J. & McCarroll, S. A. Discovery and genotyping of genome structural polymorphism by sequencing on a population scale. Nature Genet. 43, 269–276 (2011).

    CAS  PubMed  Google Scholar 

  60. Rowley, J. D. A new consistent chromosomal abnormality in chronic myelogenous leukaemia identified by quinacrine fluorescence and Giemsa staining. Nature 243, 290–293 (1973).

    CAS  PubMed  Google Scholar 

  61. Huang, M. E. et al. Use of all-trans retinoic acid in the treatment of acute promyelocytic leukemia. Blood 72, 567–572 (1988).

    CAS  PubMed  Google Scholar 

  62. Huang, M. E. [Treatment of acute promyelocytic leukemia with all-trans retinoic acid]. Zhonghua Yi Xue Za Zhi 68, 131–133, 10 (in Chinese) (1988).

    CAS  PubMed  Google Scholar 

  63. Tomlins, S. A. et al. Integrative molecular concept modeling of prostate cancer progression. Nature Genet. 39, 41–51 (2007).

    CAS  PubMed  Google Scholar 

  64. Kim, Y. K. et al. Cooperation of H2O2-mediated ERK activation with Smad pathway in TGF-β1 induction of p21WAF1/Cip1. Cell. Signall. 18, 236–243 (2006).

    CAS  Google Scholar 

  65. McPherson, A. et al. deFuse: an algorithm for gene fusion discovery in tumor RNA-seq data. PLoS Comput. Biol. 7, e1001138 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  66. Wang, K. et al. MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res. 38, e178 (2010).

    PubMed  PubMed Central  Google Scholar 

  67. Iyer, M. K., Chinnaiyan, A. M. & Maher, C. A. ChimeraScan: a tool for identifying chimeric transcription in sequencing data. Bioinformatics 27, 2903–2904 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  68. Chen, K. et al. BreakFusion: targeted assembly-based identification of gene fusions in whole transcriptome paired-end sequencing data. Bioinformatics 28, 1923–1924 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  69. Berger, M. F. et al. The genomic complexity of primary human prostate cancer. Nature 470, 214–220 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  70. Stephens, P. J. et al. Massive genomic rearrangement acquired in a single catastrophic event during cancer development. Cell 144, 27–40 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  71. McPherson, A. et al. Comrad: detection of expressed rearrangements by integrated analysis of RNA-seq and low coverage genome sequence data. Bioinformatics 27, 1481–1488 (2011).

    CAS  PubMed  Google Scholar 

  72. McPherson, A. et al. nFuse: discovery of complex genomic rearrangements in cancer using high-throughput sequencing. Genome Res. 22, 2250–2261 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  73. Chen, K. et al. BreakTrans: uncovering the genomic architecture of gene fusions. Genome Biol. 14, R87 (2013).

    PubMed  PubMed Central  Google Scholar 

  74. Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010). ANNOVAR is a versatile and widely used tool for functional annotation of variants. It is often accessed through its web interface wANNOVAR.

    PubMed  PubMed Central  Google Scholar 

  75. Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SNPeff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 6, 80–92 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  76. Woolfe, A., Mullikin, J. C. & Elnitski, L. Genomic features defining exonic variants that modulate splicing. Genome Biol. 11, R20 (2010).

    PubMed  PubMed Central  Google Scholar 

  77. Khurana, E. et al. Integrative annotation of variants from 1092 humans: application to cancer genomics. Science 342, 1235587 (2013).

    PubMed  PubMed Central  Google Scholar 

  78. Chelala, C., Khan, A. & Lemoine, N. R. SNPnexus: a web database for functional annotation of newly discovered and public domain single nucleotide polymorphisms. Bioinformatics 25, 655–661 (2009).

    CAS  PubMed  Google Scholar 

  79. Yandell, M. et al. A probabilistic disease-gene finder for personal genomes. Genome Res. 21, 1529–1542 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  80. Paila, U., Chapman, B. A., Kirchner, R. & Quinlan, A. R. GEMINI: integrative exploration of genetic variation and genome annotations. PLoS Comput. Biol. 9, e1003153 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  81. Nakken, S., Alseth, I. & Rognes, T. Computational prediction of the effects of non-synonymous single nucleotide polymorphisms in human DNA repair genes. Neuroscience 145, 1273–1279 (2007). PolyPhen is a concatenation of 'polymorphism phenotyping' and predicts the impact of amino acid changes on proteins. It is often used in conjunction with SIFT.

    CAS  PubMed  Google Scholar 

  82. Ng, P. C. & Henikoff, S. SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 31, 3812–3814 (2003). SIFT infers whether amino acid substitution has an effect on subsequent functioning of proteins and is often used in conjunction with PolyPhen.

    CAS  PubMed  PubMed Central  Google Scholar 

  83. Reva, B., Antipin, Y. & Sander, C. Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic Acids Res. 39, e118 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  84. Gonzalez-Perez, A. & Lopez-Bigas, N. Improving the assessment of the outcome of nonsynonymous SNVs with a consensus deleteriousness score, Condel. Am. J. Hum. Genet. 88, 440–449 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  85. Wong, W. C. et al. CHASM and SNVBox: toolkit for detecting biologically important single nucleotide mutations in cancer. Bioinformatics 27, 2147–2148 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  86. Carter, H. et al. Cancer-specific high-throughput annotation of somatic mutations: computational prediction of driver missense mutations. Cancer Res. 69, 6660–6667 (2009). CHASM (described in references 85 and 86) is a popular tool for assessing functional impact of somatic missense mutations on the basis of whether they confer selective advantage on cancerous cells.

    CAS  PubMed  PubMed Central  Google Scholar 

  87. Gonzalez-Perez, A., Deu-Pons, J. & Lopez-Bigas, N. Improving the prediction of the functional impact of cancer mutations by baseline tolerance transformation. Genome Med. 4, 89 (2012).

    PubMed  PubMed Central  Google Scholar 

  88. Gonzalez-Perez, A. & Lopez-Bigas, N. Functional impact bias reveals cancer drivers. Nucleic Acids Res. 40, e169 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  89. Reimand, J. & Bader, G. D. Systematic analysis of somatic mutations in phosphorylation signaling predicts novel cancer drivers. Mol. Systems Biol. 9, 637 (2013).

    Google Scholar 

  90. Greenman, C., Wooster, R., Futreal, P. A., Stratton, M. R. & Easton, D. F. Statistical analysis of pathogenicity of somatic mutations in cancer. Genetics 173, 2187–2198 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  91. Getz, G. et al. Comment on “The consensus coding sequences of human breast and colorectal cancers”. Science 317, 1500 (2007).

    CAS  PubMed  Google Scholar 

  92. Dees, N. D. et al. MuSiC: Identifying mutational significance in cancer genomes. Genome Res. 22, 1589–1598 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  93. Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  94. Cancer Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455, 1061–1068 (2008).

  95. Cancer Genome Atlas Research Network. Integrated genomic analyses of ovarian carcinoma. Nature 474, 609–615 (2011).

  96. Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012).

  97. Cancer Genome Atlas Research Network. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N. Engl. J. Med. 368, 2059–2074 (2013).

  98. Ding, L. et al. Somatic mutations affect key pathways in lung adenocarcinoma. Nature 455, 1069–1075 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  99. Vogelstein, B. et al. Cancer genome landscapes. Science 339, 1546–1558 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  100. Davoli, T. et al. Cumulative haploinsufficiency and triplosensitivity drive aneuploidy patterns and shape the cancer genome. Cell 155, 948–962 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  101. Ye, J., Pavlicek, A., Lunney, E. A., Rejto, P. A. & Teng, C. H. Statistical method on nonrandom clustering with application to somatic mutations in cancer. BMC Bioinformatics 11, 11 (2010).

    PubMed  PubMed Central  Google Scholar 

  102. Ryslik, G. A., Cheng, Y., Cheung, K. H., Modis, Y. & Zhao, H. Utilizing protein structure to identify non-random somatic mutations. BMC Bioinformatics 14, 190 (2013).

    PubMed  PubMed Central  Google Scholar 

  103. Lawrence, M. S. et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505, 495–501 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  104. Kanehisa, M., Goto, S., Furumichi, M., Tanabe, M. & Hirakawa, M. KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res. 38, D355–D360 (2010).

    CAS  PubMed  Google Scholar 

  105. Ashburner, M. et al. Gene ontology: tool for the unification of biology. Gene Ontol. Consort. Nature Genet. 25, 25–29 (2000).

    CAS  Google Scholar 

  106. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).

    CAS  PubMed  Google Scholar 

  107. Lin, J. et al. A multidimensional analysis of genes mutated in breast and colorectal cancers. Genome Res. 17, 1304–1318 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  108. Huang da, W., Sherman, B. T. & Lempicki, R. A. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 37, 1–13 (2009).

    PubMed  Google Scholar 

  109. Wendl, M. C. et al. PathScan: a tool for discerning mutational significance in groups of putative cancer genes. Bioinformatics 27, 1595–1602 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  110. Boca, S. M., Kinzler, K. W., Velculescu, V. E., Vogelstein, B. & Parmigiani, G. Patient-oriented gene set analysis for cancer mutation data. Genome Biol. 11, R112 (2010).

    PubMed  PubMed Central  Google Scholar 

  111. Peri, S. et al. Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Res. 13, 2363–2371 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  112. Croft, D. et al. Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Res. 39, D691–D697 (2011).

    CAS  PubMed  Google Scholar 

  113. Chatr-Aryamontri, A. et al. The BioGRID interaction database: 2013 update. Nucleic Acids Res. 41, D816–D823 (2013).

    CAS  PubMed  Google Scholar 

  114. Franceschini, A. et al. STRING v9.1: protein–protein interaction networks, with increased coverage and integration. Nucleic Acids Res. 41, D808–D815 (2013).

    CAS  PubMed  Google Scholar 

  115. Das, J. & Yu, H. HINT: high-quality protein interactomes and their applications in understanding human disease. BMC Systems Biol. 6, 92 (2012).

    Google Scholar 

  116. Razick, S., Magklaras, G. & Donaldson, I. M. iRefIndex: a consolidated protein interaction database with provenance. BMC Bioinformatics 9, 405 (2008).

    PubMed  PubMed Central  Google Scholar 

  117. Bernstein, B. E. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

    Google Scholar 

  118. Khurana, E., Fu, Y., Chen, J. & Gerstein, M. Interpretation of genomic variants using a unified biological network approach. PLoS Comput. Biol. 9, e1002886 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  119. Vandin, F., Upfal, E. & Raphael, B. J. Algorithms for detecting significantly mutated pathways in cancer. J. Comput. Biol. 18, 507–522 (2011).

    CAS  PubMed  Google Scholar 

  120. Cancer Genome Atlas Research Network. Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature 499, 43–49 (2013).

  121. Hofree, M., Shen, J. P., Carter, H., Gross, A. & Ideker, T. Network-based stratification of tumor mutations. Nature Methods 10, 1108–1115 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  122. Ciriello, G., Cerami, E., Sander, C. & Schultz, N. Mutual exclusivity analysis identifies oncogenic network modules. Genome Res. 22, 398–406 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  123. Vogelstein, B. & Kinzler, K. W. Cancer genes and the pathways they control. Nature Med. 10, 789–799 (2004).

    CAS  PubMed  Google Scholar 

  124. Yeang, C. H., McCormick, F. & Levine, A. Combinatorial patterns of somatic gene mutations in cancer. Faseb J. 22, 2605–2622 (2008).

    CAS  PubMed  Google Scholar 

  125. Paull, E. O. et al. Discovering causal pathways linking genomic events to transcriptional states using Tied Diffusion Through Interacting Events (TieDIE). Bioinformatics 29, 2757–2764 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  126. Vaske, C. J. et al. Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics 26, i237–i245 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  127. Saal, L. H. et al. PIK3CA mutations correlate with hormone receptors, node metastasis, and ERBB2, and are mutually exclusive with PTEN loss in human breast carcinoma. Cancer Res. 65, 2554–2559 (2005).

    CAS  PubMed  Google Scholar 

  128. Vandin, F., Upfal, E. & Raphael, B. J. De novo discovery of mutated driver pathways in cancer. Genome Res. 22, 375–385 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  129. Leiserson, M. D., Blokh, D., Sharan, R. & Raphael, B. J. Simultaneous identification of multiple driver pathways in cancer. PLoS Comput. Biol. 9, e1003054 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  130. Miller, C. A., Settle, S. H., Sulman, E. P., Aldape, K. D. & Milosavljevic, A. Discovering functional modules by identifying recurrent and mutually exclusive mutational patterns in tumors. BMC Med. Genom. 4, 34 (2011).

    Google Scholar 

  131. Kandoth, C. et al. Mutational landscape and significance across 12 major cancer types. Nature 502, 333–339 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  132. Nik-Zainal, S. et al. Mutational processes molding the genomes of 21 breast cancers. Cell 149, 979–993 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  133. Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  134. Albertson, D. G., Collins, C., McCormick, F. & Gray, J. W. Chromosome aberrations in solid tumors. Nature Genet. 34, 369–376 (2003).

    CAS  PubMed  Google Scholar 

  135. Rausch, T. et al. Genome sequencing of pediatric medulloblastoma links catastrophic DNA rearrangements with TP53 mutations. Cell 148, 59–71 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  136. Maher, C. A. & Wilson, R. K. Chromothripsis and human disease: piecing together the shattering process. Cell 148, 29–32 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  137. Forment, J. V., Kaidi, A. & Jackson, S. P. Chromothripsis and cancer: causes and consequences of chromosome shattering. Nature Rev. Cancer 12, 663–670 (2012).

    CAS  Google Scholar 

  138. Baca, S. C. et al. Punctuated evolution of prostate cancer genomes. Cell 153, 666–677 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  139. Malhotra, A. et al. Breakpoint profiling of 64 cancer genomes reveals numerous complex rearrangements spawned by homology-independent mechanisms. Genome Res. 23, 762–776 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  140. Sorzano, C. O., Pascual-Montano, A., Sanchez de Diego, A., Martinez, A. C. & van Wely, K. H. Chromothripsis: breakage–fusion–bridge over and over again. Cell Cycle 12, 2016–2023 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  141. Korbel, J. O. & Campbell, P. J. Criteria for inference of chromothripsis in cancer genomes. Cell 152, 1226–1236 (2013).

    CAS  PubMed  Google Scholar 

  142. Oesper, L., Ritz, A., Aerni, S. J., Drebin, R. & Raphael, B. J. Reconstructing cancer genomes from paired-end sequencing data. BMC Bioinformatics 13 (Suppl. 6), S10 (2012).

    PubMed  PubMed Central  Google Scholar 

  143. Landau, D. A. et al. Evolution and impact of subclonal mutations in chronic lymphocytic leukemia. Cell 152, 714–726 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  144. Keats, J. J. et al. Clonal competition with alternating dominance in multiple myeloma. Blood 120, 1067–1076 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  145. Turke, A. B. et al. Preexistence and clonal selection of MET amplification in EGFR mutant NSCLC. Cancer Cell 17, 77–88 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  146. Parzen, E. On estimation of a probability density function and mode. Ann. Math. Statist. 33, 1065–1076 (1962).

    Google Scholar 

  147. Rosenblatt, M. Remarks on some non-parametric estimates of a density function. Ann. Math. Statist. 27, 832–837 (1956).

    Google Scholar 

  148. Carter, S. L. et al. Absolute quantification of somatic DNA alterations in human cancer. Nature Biotech. 30, 413–421 (2012).

    CAS  Google Scholar 

  149. Shah, S. P. et al. The clonal and mutational evolution spectrum of primary triple-negative breast cancers. Nature 486, 395–399 (2012).

    CAS  PubMed  Google Scholar 

  150. Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  151. Oesper, L., Mahmoody, A. & Raphael, B. J. THetA: inferring intra-tumor heterogeneity from high-throughput DNA sequencing data. Genome Biol. 14, R80 (2013).

    PubMed  PubMed Central  Google Scholar 

  152. Gonzalez-Perez, A. et al. Computational approaches to identify functional genetic variants in cancer genomes. Nature Methods 10, 723–729 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  153. Raphael, B. J., Dobson, J. R., Oesper, L. & Vandin, F. Identifying driver mutations in sequenced cancer genomes: computational approaches to enable precision medicine. Genome Med. 6, 5 (2014).

    PubMed  PubMed Central  Google Scholar 

  154. Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  155. Kolata, G. In Treatment for Leukemia, Glimpses of the Future. The New York Times A1 (7 July 2012).

  156. Lander, E. S. & Waterman, M. S. Genomic mapping by fingerprinting random clones: a mathematical analysis. Genomics 2, 231–239 (1988).

    CAS  PubMed  Google Scholar 

  157. Wendl, M. C. & Wilson, R. K. Aspects of coverage in medical DNA sequencing. BMC Bioinformatics 9, 239 (2008).

    PubMed  PubMed Central  Google Scholar 

  158. Bashir, A., Volik, S., Collins, C., Bafna, V. & Raphael, B. J. Evaluation of paired-end sequencing strategies for detection of genome rearrangements in cancer. PLoS Comput. Biol. 4, e1000051 (2008).

    PubMed  PubMed Central  Google Scholar 

  159. Wendl, M. C. & Wilson, R. K. Statistical aspects of discerning indel-type structural variation via DNA sequence alignment. BMC Genomics 10, 359 (2009).

    PubMed  PubMed Central  Google Scholar 

  160. Boffetta, P. & Nyberg, F. Contribution of environmental factors to cancer risk. Br. Med. Bull. 68, 71–94 (2003).

    CAS  PubMed  Google Scholar 

  161. Cerwenka, A. & Lanier, L. L. Natural killer cells, viruses and cancer. Nature Rev. Immunol. 1, 41–49 (2001).

    CAS  Google Scholar 

  162. Cancer Genome Atlas Network. Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330–337 (2012).

  163. Stransky, N. et al. The mutational landscape of head and neck squamous cell carcinoma. Science 333, 1157–1160 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  164. Parkin, D. M. The global health burden of infection-associated cancers in the year 2002. Int. J. Cancer 118, 3030–3044 (2006).

    CAS  PubMed  Google Scholar 

  165. Kostic, A. D. et al. PathSeq: software to identify or discover microbes by deep sequencing of human tissue. Nature Biotech. 29, 393–396 (2011).

    CAS  Google Scholar 

  166. Bhaduri, A., Qu, K., Lee, C. S., Ungewickell, A. & Khavari, P. A. Rapid identification of non-human sequences in high-throughput sequencing datasets. Bioinformatics 28, 1174–1175 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  167. Layer, R. M., Chiang, C., Quinlan, A. R. & Hall, I. M. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 15, R84 (2014).

    PubMed  PubMed Central  Google Scholar 

  168. Van Loo, P. et al. Allele-specific copy number analysis of tumors. Proc. Natl Acad. Sci. USA 107, 16910–16915 (2010).

    CAS  PubMed  Google Scholar 

  169. Adzhubei, I. A. et al. A method and server for predicting damaging missense mutations. Nature Methods 7, 248–249 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  170. McLaren, W. et al. Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics 26, 2069–2070 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  171. Tamborero, D., Lopez-Bigas, N. & Gonzalez-Perez, A. Oncodrive-CIS: a method to reveal likely driver genes based on the impact of their copy number changes on expression. PLoS ONE 8, e55489 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  172. Tamborero, D., Gonzalez-Perez, A. & Lopez-Bigas, N. OncodriveCLUST: exploiting the positional clustering of somatic mutations to identify cancer genes. Bioinformatics 29, 2238–2244 (2013).

    CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This work was supported by the US National Human Genome Research Institute (grants U01HG006517 to L.D.; R01HG005690 and R01HG007069 to B.J.R.) and by the US National Cancer Institute (grant R01CA180006 to L.D.). The authors thank K. Ye and M. D. McLellan for comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Ding.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Related links

PowerPoint slides

Glossary

Pyrosequencing

A specific sequencing- by-synthesis method in which detection is based on chemiluminescent signals from luciferin conversion.

Sequencing-by-ligation

A sequencing method based on the mismatch sensitivity of DNA ligase to detect nucleotides.

Sequencing-by-synthesis

A sequencing method that uses sequential polymerization of nucleotides to a template, in which each incorporation is inferred by an imaging process, usually from a fluorescent dye attached to the added nucleotide.

Driver mutations

Somatic mutations that have causal roles in initiation, progression, metastasis or recurrence of cancer.

Significantly mutated genes

(SMGs). Genes with rates of somatic mutations that are higher than the random background rates, which suggests a role in tumour initiation or progression.

Sequence coverage theory

A theory that characterizes sequencing processes mathematically to support development of detection methods, as well as analysis and design of sequencing projects.

Type I errors

Errors made when effects are declared when none actually exists, which lead to false positives.

Type II errors

Errors made when actual effects are overlooked, which lead to false negatives.

Paired-end mapping

Coordinated mapping of both sequenced ends of a fragment to a reference genome, in which the approximately known separation between the two ends provides extra information against misalignments.

Gapped alignment

An alignment process in which small gaps are allowed if they support a better fit.

Split read

The phenomenon in which a read spans a deleted site, whereby the read appears to be split in its alignment to a reference.

De novo assembly

Reconstruction of a genomic target by assessing consensus sequence from alignments of overlapping reads and clones.

Precision

The fraction of the total number of called events that are true.

Passenger mutations

Somatic mutations that arise incidentally and that have no mechanistic role in cancer initiation or progression.

Background mutation rate

(BMR). The rate at which spontaneous mutations occur as a result of uncorrected copying errors.

Kataegis

The appearance of regions of local hypermutations in a tumour genome.

Chromothripsis

A catastrophic mutational event that 'shatters' one or more chromosomes, which leads to simultaneous loss and rearrangement of multiple chromosomal segments.

Chromoplexy

A mutational event that results in substantial and complex rearrangements that involve multiple loci, although it is not as severe as chromothripsis and involves less clustering of rearrangement breakpoints.

Clonal evolution

The emergence of novel clones that have improved survival or propagational fitness according to the particular sets of somatic mutations that have accumulated in these clones.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ding, L., Wendl, M., McMichael, J. et al. Expanding the computational toolbox for mining cancer genomes. Nat Rev Genet 15, 556–570 (2014). https://doi.org/10.1038/nrg3767

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg3767

This article is cited by

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer