Expanding the computational toolbox for mining cancer genomes

Ding, Li; Wendl, Michael C.; McMichael, Joshua F.; Raphael, Benjamin J.

doi:10.1038/nrg3767

Review Article
Published: 08 July 2014

Expanding the computational toolbox for mining cancer genomes

Li Ding^1,2,3,4,
Michael C. Wendl^1,3,5,
Joshua F. McMichael¹ &
…
Benjamin J. Raphael⁶

Nature Reviews Genetics volume 15, pages 556–570 (2014)Cite this article

18k Accesses
129 Citations
63 Altmetric
Metrics details

Subjects

Key Points

High-throughput sequencing of cancer genomes, exomes and transcriptomes has enabled the identification of many novel somatic aberrations, and has provided new insights into cancer biology and new therapeutic targets.
Computational and statistical tools are necessary for interpreting the large and complex data sets that result from high-throughput sequencing approaches.
Mature software for detecting single-nucleotide variants, insertions and deletions, copy-number aberrations, structural variants and gene fusions in cancer genomes are now available. Additional challenges remain in increasing the sensitivity and specificity of these algorithms.
Computational techniques are essential for assigning priority to somatic aberrations that are likely to be functional for further experimental validation. Two common approaches are to predict functional impact of individual mutations using prior biological knowledge and to identify recurrently mutated genes, pathways and networks across many samples.
Algorithms to infer the clonal structure and evolutionary history of a tumour from ultra-deep sequencing data have recently been introduced. Applications of these techniques have shown that minority mutations in primary tumours may increase to majority in relapse or metastasis.
Sequencing of cancer genomes has shown a wide range of specialized mutational processes, including kataegis, chromothripsis and chromoplexy that result in rapid genomic changes and punctuated tumour evolution.

Abstract

High-throughput DNA sequencing has revolutionized the study of cancer genomics with numerous discoveries that are relevant to cancer diagnosis and treatment. The latest sequencing and analysis methods have successfully identified somatic alterations, including single-nucleotide variants, insertions and deletions, copy-number aberrations, structural variants and gene fusions. Additional computational techniques have proved useful for defining the mutations, genes and molecular networks that drive diverse cancer phenotypes and that determine clonal architectures in tumour samples. Collectively, these tools have advanced the study of genomic, transcriptomic and epigenomic alterations in cancer, and their association to clinical properties. Here, we review cancer genomics software and the insights that have been gained from their application.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Sample procurement, sequencing and analysis roadmap.**

**Figure 2: Biological factors relevant to assessing significantly mutated genes in cancer.**

**Figure 3: Significantly mutated genes, pathways and networks.**

**Figure 4: A conceptual example of clonal evolution model and clonality analyses.**

Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis

Article Open access 25 March 2024

Wenpin Hou & Zhicheng Ji

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

Qiuyue Yuan & Zhana Duren

A single-cell atlas enables mapping of homeostatic cellular shifts in the adult human breast

Article Open access 28 March 2024

Austin D. Reed, Sara Pensa, … Walid T. Khaled

References

Sanger, F., Nicklen, S. & Coulson, A. R. DNA sequencing with chain-terminating inhibitors. Proc. Natl Acad. Sci. USA 74, 5463–5467 (1977).
CAS PubMed Google Scholar
Sanger, F., Nicklen, S. & Coulson, A. R. DNA sequencing with chain-terminating inhibitors. 1977. Biotechnology 24, 104–108 (1992).
CAS PubMed Google Scholar
Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
CAS PubMed Google Scholar
Ley, T. J. et al. DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature 456, 66–72 (2008).
CAS PubMed PubMed Central Google Scholar
Shendure, J. & Lieberman Aiden, E. The expanding scope of DNA sequencing. Nature Biotech. 30, 1084–1094 (2012).
CAS Google Scholar
Majewski, J., Schwartzentruber, J., Lalonde, E., Montpetit, A. & Jabado, N. What can exome sequencing do for you? J. Med. Genet. 48, 580–589 (2011).
CAS PubMed Google Scholar
Ozsolak, F. & Milos, P. M. RNA sequencing: advances, challenges and opportunities. Nature Rev. Genet. 12, 87–98 (2011).
CAS PubMed Google Scholar
Krueger, F., Kreck, B., Franke, A. & Andrews, S. R. DNA methylome analysis using short bisulfite sequencing data. Nature Methods 9, 145–151 (2012).
CAS PubMed Google Scholar
Ding, L. et al. Genome remodelling in a basal-like breast cancer metastasis and xenograft. Nature 464, 999–1005 (2010).
CAS PubMed PubMed Central Google Scholar
Nowell, P. C. The clonal evolution of tumor cell populations. Science 194, 23–28 (1976).
CAS PubMed Google Scholar
Ding, L. et al. Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing. Nature 481, 506–510 (2012).
CAS PubMed PubMed Central Google Scholar
Gerlinger, M. et al. Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N. Engl. J. Med. 366, 883–892 (2012).
CAS PubMed PubMed Central Google Scholar
Navin, N. et al. Inferring tumor progression from genomic heterogeneity. Genome Res. 20, 68–80 (2010).
CAS PubMed PubMed Central Google Scholar
Navin, N. E. & Hicks, J. Tracing the tumor lineage. Mol. Oncol. 4, 267–283 (2010).
PubMed PubMed Central Google Scholar
Navin, N. et al. Tumour evolution inferred by single-cell sequencing. Nature 472, 90–94 (2011).
CAS PubMed PubMed Central Google Scholar
Hou, Y. et al. Single-cell exome sequencing and monoclonal evolution of a JAK2-negative myeloproliferative neoplasm. Cell 148, 873–885 (2012).
CAS PubMed Google Scholar
Xu, X. et al. Single-cell exome sequencing reveals single-nucleotide mutation characteristics of a kidney tumor. Cell 148, 886–895 (2012).
CAS PubMed Google Scholar
Gundry, M., Li, W., Maqbool, S. B. & Vijg, J. Direct, genome-wide assessment of DNA mutations in single cells. Nucleic Acids Res. 40, 2032–2040 (2012).
CAS PubMed Google Scholar
Baslan, T. et al. Genome-wide copy number analysis of single cells. Nature Protoc. 7, 1024–1041 (2012).
CAS Google Scholar
Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012).
CAS PubMed PubMed Central Google Scholar
Kim, S. Y. & Speed, T. P. Comparing somatic mutation-callers: beyond Venn diagrams. BMC Bioinformatics 14, 189 (2013).
PubMed PubMed Central Google Scholar
Goode, D. L. et al. A simple consensus approach improves somatic mutation prediction accuracy. Genome Med. 5, 90 (2013).
PubMed PubMed Central Google Scholar
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010). The GATK is a broad and widely used toolkit for variant discovery and data processing.
CAS PubMed PubMed Central Google Scholar
Koboldt, D. C. et al. VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics 25, 2283–2285 (2009).
CAS PubMed PubMed Central Google Scholar
Koboldt, D. C. et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568–576 (2012). VarScan (described in references 24 and 25) is one of the early programs for somatic SNV detection and has since added additional capability for germline, copy-number and indel events.
CAS PubMed PubMed Central Google Scholar
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009). SAMtools is a broad set of utilities for processing sequence data in the standardized SAM/BAM format, including variant calling.
PubMed PubMed Central Google Scholar
Larson, D. E. et al. SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics 28, 311–317 (2012).
CAS PubMed Google Scholar
Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nature Biotech. 31, 213–219 (2013). MuTect is a widely used program for identifying somatic SNVs in tumour–normal pair sequencing data.
CAS Google Scholar
Saunders, C. T. et al. Strelka: accurate somatic small-variant calling from sequenced tumor–normal sample pairs. Bioinformatics 28, 1811–1817 (2012).
CAS PubMed Google Scholar
Goya, R. et al. SNVMix: predicting single nucleotide variants from next-generation sequencing of tumors. Bioinformatics 26, 730–736 (2010).
CAS PubMed PubMed Central Google Scholar
Roth, A. et al. JointSNVMix: a probabilistic model for accurate detection of somatic mutations in normal/tumour paired next-generation sequencing data. Bioinformatics 28, 907–913 (2012).
CAS PubMed PubMed Central Google Scholar
Lunter, G. Probabilistic whole-genome alignments reveal high indel rates in the human and mouse genomes. Bioinformatics 23, i289–i296 (2007).
CAS PubMed Google Scholar
Cartwright, R. A. Problems and solutions for estimating indel rates and length distributions. Mol. Biol. Evol. 26, 473–480 (2009).
CAS PubMed Google Scholar
Li, H., Ruan, J. & Durbin, R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18, 1851–1858 (2008).
CAS PubMed PubMed Central Google Scholar
Smith, C. C. et al. Validation of ITD mutations in FLT3 as a therapeutic target in human acute myeloid leukaemia. Nature 485, 260–263 (2012).
CAS PubMed PubMed Central Google Scholar
Spencer, D. H. et al. Detection of FLT3 internal tandem duplication in targeted, short-read-length, next-generation sequencing data. J. Mol. Diagn. 15, 81–93 (2013).
CAS PubMed Google Scholar
Albers, C. A. et al. Dindel: accurate indel calls from short-read data. Genome Res. 21, 961–973 (2011).
CAS PubMed PubMed Central Google Scholar
Ye, K., Schulz, M. H., Long, Q., Apweiler, R. & Ning, Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 25, 2865–2871 (2009). Pindel is focused on identifying breakpoints at single-base-resolution of indels, inversions and tandem duplications.
CAS PubMed PubMed Central Google Scholar
Ye, K., Kosters, W. A. & Ijzerman, A. P. An efficient, versatile and scalable pattern growth approach to mine frequent patterns in unaligned protein sequences. Bioinformatics 23, 687–693 (2007).
CAS PubMed Google Scholar
Rausch, T. et al. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28, i333–i339 (2012).
CAS PubMed PubMed Central Google Scholar
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv 1303.3997 [q-bio. GN] (2013).
Chen, K. et al. TIGRA: a targeted iterative graph routing assembler for breakpoint assembly. Genome Res. 24, 310–317 (2014).
PubMed PubMed Central Google Scholar
Bignell, G. R. et al. Signatures of mutation and selection in the cancer genome. Nature 463, 893–898 (2010).
CAS PubMed PubMed Central Google Scholar
Beroukhim, R. et al. The landscape of somatic copy-number alteration across human cancers. Nature 463, 899–905 (2010).
CAS PubMed PubMed Central Google Scholar
Yoon, S., Xuan, Z., Makarov, V., Ye, K. & Sebat, J. Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Res. 19, 1586–1592 (2009).
CAS PubMed PubMed Central Google Scholar
Campbell, P. J. et al. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nature Genet. 40, 722–729 (2008).
CAS PubMed Google Scholar
Beroukhim, R. et al. Assessing the significance of chromosomal aberrations in cancer: methodology and application to glioma. Proc. Natl Acad. Sci. USA 104, 20007–20012 (2007). GISTIC is one of the standard tools for finding genes that are affected by CNAs which have a bearing on cancer initiation or progression.
CAS PubMed Google Scholar
Zhang, Q. et al. CMDS: a population-based method for identifying recurrent DNA copy number aberrations in cancer from high-resolution data. Bioinformatics 26, 464–469 (2010).
CAS PubMed Google Scholar
Raphael, B. J., Volik, S., Collins, C. & Pevzner, P. A. Reconstructing tumor genome architectures. Bioinformatics 19 (Suppl. 2), ii162–ii171 (2003).
PubMed Google Scholar
Raphael, B. J. et al. A sequence-based survey of the complex structural organization of tumor genomes. Genome Biol. 9, R59 (2008).
PubMed PubMed Central Google Scholar
Volik, S. et al. Decoding the fine-scale structure of a breast cancer genome and transcriptome. Genome Res. 16, 394–404 (2006).
CAS PubMed PubMed Central Google Scholar
Volik, S. et al. End-sequence profiling: sequence-based analysis of aberrant genomes. Proc. Natl Acad. Sci. USA 100, 7696–7701 (2003).
PubMed Google Scholar
Bignell, G. R. et al. Architectures of somatic genomic rearrangement in human cancer amplicons at sequence-level resolution. Genome Res. 17, 1296–1303 (2007).
CAS PubMed PubMed Central Google Scholar
Chen, K. et al. BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nature Methods 6, 677–681 (2009). BreakDancer is a general tool for identifying structural variations (including insertions, deletions, inversions and translocations) using the concept of discordant read pairs.
CAS PubMed PubMed Central Google Scholar
Wang, J. et al. CREST maps somatic structural variation in cancer genomes with base-pair resolution. Nature Methods 8, 652–654 (2011).
CAS PubMed PubMed Central Google Scholar
Hormozdiari, F., Alkan, C., Eichler, E. E. & Sahinalp, S. C. Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes. Genome Res. 19, 1270–1278 (2009).
CAS PubMed PubMed Central Google Scholar
Sindi, S., Helman, E., Bashir, A. & Raphael, B. J. A geometric approach for classification and comparison of structural variants. Bioinformatics 25, i222–i230 (2009).
CAS PubMed PubMed Central Google Scholar
Sindi, S. S., Onal, S., Peng, L. C., Wu, H. T. & Raphael, B. J. An integrative probabilistic model for identification of structural variation in sequencing data. Genome Biol. 13, R22 (2012).
PubMed PubMed Central Google Scholar
Handsaker, R. E., Korn, J. M., Nemesh, J. & McCarroll, S. A. Discovery and genotyping of genome structural polymorphism by sequencing on a population scale. Nature Genet. 43, 269–276 (2011).
CAS PubMed Google Scholar
Rowley, J. D. A new consistent chromosomal abnormality in chronic myelogenous leukaemia identified by quinacrine fluorescence and Giemsa staining. Nature 243, 290–293 (1973).
CAS PubMed Google Scholar
Huang, M. E. et al. Use of all-trans retinoic acid in the treatment of acute promyelocytic leukemia. Blood 72, 567–572 (1988).
CAS PubMed Google Scholar
Huang, M. E. [Treatment of acute promyelocytic leukemia with all-trans retinoic acid]. Zhonghua Yi Xue Za Zhi 68, 131–133, 10 (in Chinese) (1988).
CAS PubMed Google Scholar
Tomlins, S. A. et al. Integrative molecular concept modeling of prostate cancer progression. Nature Genet. 39, 41–51 (2007).
CAS PubMed Google Scholar
Kim, Y. K. et al. Cooperation of H2O2-mediated ERK activation with Smad pathway in TGF-β1 induction of p21^WAF1/Cip1. Cell. Signall. 18, 236–243 (2006).
CAS Google Scholar
McPherson, A. et al. deFuse: an algorithm for gene fusion discovery in tumor RNA-seq data. PLoS Comput. Biol. 7, e1001138 (2011).
CAS PubMed PubMed Central Google Scholar
Wang, K. et al. MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res. 38, e178 (2010).
PubMed PubMed Central Google Scholar
Iyer, M. K., Chinnaiyan, A. M. & Maher, C. A. ChimeraScan: a tool for identifying chimeric transcription in sequencing data. Bioinformatics 27, 2903–2904 (2011).
CAS PubMed PubMed Central Google Scholar
Chen, K. et al. BreakFusion: targeted assembly-based identification of gene fusions in whole transcriptome paired-end sequencing data. Bioinformatics 28, 1923–1924 (2012).
CAS PubMed PubMed Central Google Scholar
Berger, M. F. et al. The genomic complexity of primary human prostate cancer. Nature 470, 214–220 (2011).
CAS PubMed PubMed Central Google Scholar
Stephens, P. J. et al. Massive genomic rearrangement acquired in a single catastrophic event during cancer development. Cell 144, 27–40 (2011).
CAS PubMed PubMed Central Google Scholar
McPherson, A. et al. Comrad: detection of expressed rearrangements by integrated analysis of RNA-seq and low coverage genome sequence data. Bioinformatics 27, 1481–1488 (2011).
CAS PubMed Google Scholar
McPherson, A. et al. nFuse: discovery of complex genomic rearrangements in cancer using high-throughput sequencing. Genome Res. 22, 2250–2261 (2012).
CAS PubMed PubMed Central Google Scholar
Chen, K. et al. BreakTrans: uncovering the genomic architecture of gene fusions. Genome Biol. 14, R87 (2013).
PubMed PubMed Central Google Scholar
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010). ANNOVAR is a versatile and widely used tool for functional annotation of variants. It is often accessed through its web interface wANNOVAR.
PubMed PubMed Central Google Scholar
Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SNPeff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 6, 80–92 (2012).
CAS PubMed PubMed Central Google Scholar
Woolfe, A., Mullikin, J. C. & Elnitski, L. Genomic features defining exonic variants that modulate splicing. Genome Biol. 11, R20 (2010).
PubMed PubMed Central Google Scholar
Khurana, E. et al. Integrative annotation of variants from 1092 humans: application to cancer genomics. Science 342, 1235587 (2013).
PubMed PubMed Central Google Scholar
Chelala, C., Khan, A. & Lemoine, N. R. SNPnexus: a web database for functional annotation of newly discovered and public domain single nucleotide polymorphisms. Bioinformatics 25, 655–661 (2009).
CAS PubMed Google Scholar
Yandell, M. et al. A probabilistic disease-gene finder for personal genomes. Genome Res. 21, 1529–1542 (2011).
CAS PubMed PubMed Central Google Scholar
Paila, U., Chapman, B. A., Kirchner, R. & Quinlan, A. R. GEMINI: integrative exploration of genetic variation and genome annotations. PLoS Comput. Biol. 9, e1003153 (2013).
CAS PubMed PubMed Central Google Scholar
Nakken, S., Alseth, I. & Rognes, T. Computational prediction of the effects of non-synonymous single nucleotide polymorphisms in human DNA repair genes. Neuroscience 145, 1273–1279 (2007). PolyPhen is a concatenation of 'polymorphism phenotyping' and predicts the impact of amino acid changes on proteins. It is often used in conjunction with SIFT.
CAS PubMed Google Scholar
Ng, P. C. & Henikoff, S. SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 31, 3812–3814 (2003). SIFT infers whether amino acid substitution has an effect on subsequent functioning of proteins and is often used in conjunction with PolyPhen.
CAS PubMed PubMed Central Google Scholar
Reva, B., Antipin, Y. & Sander, C. Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic Acids Res. 39, e118 (2011).
CAS PubMed PubMed Central Google Scholar
Gonzalez-Perez, A. & Lopez-Bigas, N. Improving the assessment of the outcome of nonsynonymous SNVs with a consensus deleteriousness score, Condel. Am. J. Hum. Genet. 88, 440–449 (2011).
CAS PubMed PubMed Central Google Scholar
Wong, W. C. et al. CHASM and SNVBox: toolkit for detecting biologically important single nucleotide mutations in cancer. Bioinformatics 27, 2147–2148 (2011).
CAS PubMed PubMed Central Google Scholar
Carter, H. et al. Cancer-specific high-throughput annotation of somatic mutations: computational prediction of driver missense mutations. Cancer Res. 69, 6660–6667 (2009). CHASM (described in references 85 and 86) is a popular tool for assessing functional impact of somatic missense mutations on the basis of whether they confer selective advantage on cancerous cells.
CAS PubMed PubMed Central Google Scholar
Gonzalez-Perez, A., Deu-Pons, J. & Lopez-Bigas, N. Improving the prediction of the functional impact of cancer mutations by baseline tolerance transformation. Genome Med. 4, 89 (2012).
PubMed PubMed Central Google Scholar
Gonzalez-Perez, A. & Lopez-Bigas, N. Functional impact bias reveals cancer drivers. Nucleic Acids Res. 40, e169 (2012).
CAS PubMed PubMed Central Google Scholar
Reimand, J. & Bader, G. D. Systematic analysis of somatic mutations in phosphorylation signaling predicts novel cancer drivers. Mol. Systems Biol. 9, 637 (2013).
Google Scholar
Greenman, C., Wooster, R., Futreal, P. A., Stratton, M. R. & Easton, D. F. Statistical analysis of pathogenicity of somatic mutations in cancer. Genetics 173, 2187–2198 (2006).
CAS PubMed PubMed Central Google Scholar
Getz, G. et al. Comment on “The consensus coding sequences of human breast and colorectal cancers”. Science 317, 1500 (2007).
CAS PubMed Google Scholar
Dees, N. D. et al. MuSiC: Identifying mutational significance in cancer genomes. Genome Res. 22, 1589–1598 (2012).
CAS PubMed PubMed Central Google Scholar
Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).
CAS PubMed PubMed Central Google Scholar
Cancer Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455, 1061–1068 (2008).
Cancer Genome Atlas Research Network. Integrated genomic analyses of ovarian carcinoma. Nature 474, 609–615 (2011).
Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012).
Cancer Genome Atlas Research Network. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N. Engl. J. Med. 368, 2059–2074 (2013).
Ding, L. et al. Somatic mutations affect key pathways in lung adenocarcinoma. Nature 455, 1069–1075 (2008).
CAS PubMed PubMed Central Google Scholar
Vogelstein, B. et al. Cancer genome landscapes. Science 339, 1546–1558 (2013).
CAS PubMed PubMed Central Google Scholar
Davoli, T. et al. Cumulative haploinsufficiency and triplosensitivity drive aneuploidy patterns and shape the cancer genome. Cell 155, 948–962 (2013).
CAS PubMed PubMed Central Google Scholar
Ye, J., Pavlicek, A., Lunney, E. A., Rejto, P. A. & Teng, C. H. Statistical method on nonrandom clustering with application to somatic mutations in cancer. BMC Bioinformatics 11, 11 (2010).
PubMed PubMed Central Google Scholar
Ryslik, G. A., Cheng, Y., Cheung, K. H., Modis, Y. & Zhao, H. Utilizing protein structure to identify non-random somatic mutations. BMC Bioinformatics 14, 190 (2013).
PubMed PubMed Central Google Scholar
Lawrence, M. S. et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505, 495–501 (2014).
CAS PubMed PubMed Central Google Scholar
Kanehisa, M., Goto, S., Furumichi, M., Tanabe, M. & Hirakawa, M. KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res. 38, D355–D360 (2010).
CAS PubMed Google Scholar
Ashburner, M. et al. Gene ontology: tool for the unification of biology. Gene Ontol. Consort. Nature Genet. 25, 25–29 (2000).
CAS Google Scholar
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).
CAS PubMed Google Scholar
Lin, J. et al. A multidimensional analysis of genes mutated in breast and colorectal cancers. Genome Res. 17, 1304–1318 (2007).
CAS PubMed PubMed Central Google Scholar
Huang da, W., Sherman, B. T. & Lempicki, R. A. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 37, 1–13 (2009).
PubMed Google Scholar
Wendl, M. C. et al. PathScan: a tool for discerning mutational significance in groups of putative cancer genes. Bioinformatics 27, 1595–1602 (2011).
CAS PubMed PubMed Central Google Scholar
Boca, S. M., Kinzler, K. W., Velculescu, V. E., Vogelstein, B. & Parmigiani, G. Patient-oriented gene set analysis for cancer mutation data. Genome Biol. 11, R112 (2010).
PubMed PubMed Central Google Scholar
Peri, S. et al. Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Res. 13, 2363–2371 (2003).
CAS PubMed PubMed Central Google Scholar
Croft, D. et al. Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Res. 39, D691–D697 (2011).
CAS PubMed Google Scholar
Chatr-Aryamontri, A. et al. The BioGRID interaction database: 2013 update. Nucleic Acids Res. 41, D816–D823 (2013).
CAS PubMed Google Scholar
Franceschini, A. et al. STRING v9.1: protein–protein interaction networks, with increased coverage and integration. Nucleic Acids Res. 41, D808–D815 (2013).
CAS PubMed Google Scholar
Das, J. & Yu, H. HINT: high-quality protein interactomes and their applications in understanding human disease. BMC Systems Biol. 6, 92 (2012).
Google Scholar
Razick, S., Magklaras, G. & Donaldson, I. M. iRefIndex: a consolidated protein interaction database with provenance. BMC Bioinformatics 9, 405 (2008).
PubMed PubMed Central Google Scholar
Bernstein, B. E. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Google Scholar
Khurana, E., Fu, Y., Chen, J. & Gerstein, M. Interpretation of genomic variants using a unified biological network approach. PLoS Comput. Biol. 9, e1002886 (2013).
CAS PubMed PubMed Central Google Scholar
Vandin, F., Upfal, E. & Raphael, B. J. Algorithms for detecting significantly mutated pathways in cancer. J. Comput. Biol. 18, 507–522 (2011).
CAS PubMed Google Scholar
Cancer Genome Atlas Research Network. Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature 499, 43–49 (2013).
Hofree, M., Shen, J. P., Carter, H., Gross, A. & Ideker, T. Network-based stratification of tumor mutations. Nature Methods 10, 1108–1115 (2013).
CAS PubMed PubMed Central Google Scholar
Ciriello, G., Cerami, E., Sander, C. & Schultz, N. Mutual exclusivity analysis identifies oncogenic network modules. Genome Res. 22, 398–406 (2012).
CAS PubMed PubMed Central Google Scholar
Vogelstein, B. & Kinzler, K. W. Cancer genes and the pathways they control. Nature Med. 10, 789–799 (2004).
CAS PubMed Google Scholar
Yeang, C. H., McCormick, F. & Levine, A. Combinatorial patterns of somatic gene mutations in cancer. Faseb J. 22, 2605–2622 (2008).
CAS PubMed Google Scholar
Paull, E. O. et al. Discovering causal pathways linking genomic events to transcriptional states using Tied Diffusion Through Interacting Events (TieDIE). Bioinformatics 29, 2757–2764 (2013).
CAS PubMed PubMed Central Google Scholar
Vaske, C. J. et al. Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics 26, i237–i245 (2010).
CAS PubMed PubMed Central Google Scholar
Saal, L. H. et al. PIK3CA mutations correlate with hormone receptors, node metastasis, and ERBB2, and are mutually exclusive with PTEN loss in human breast carcinoma. Cancer Res. 65, 2554–2559 (2005).
CAS PubMed Google Scholar
Vandin, F., Upfal, E. & Raphael, B. J. De novo discovery of mutated driver pathways in cancer. Genome Res. 22, 375–385 (2012).
CAS PubMed PubMed Central Google Scholar
Leiserson, M. D., Blokh, D., Sharan, R. & Raphael, B. J. Simultaneous identification of multiple driver pathways in cancer. PLoS Comput. Biol. 9, e1003054 (2013).
CAS PubMed PubMed Central Google Scholar
Miller, C. A., Settle, S. H., Sulman, E. P., Aldape, K. D. & Milosavljevic, A. Discovering functional modules by identifying recurrent and mutually exclusive mutational patterns in tumors. BMC Med. Genom. 4, 34 (2011).
Google Scholar
Kandoth, C. et al. Mutational landscape and significance across 12 major cancer types. Nature 502, 333–339 (2013).
CAS PubMed PubMed Central Google Scholar
Nik-Zainal, S. et al. Mutational processes molding the genomes of 21 breast cancers. Cell 149, 979–993 (2012).
CAS PubMed PubMed Central Google Scholar
Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).
CAS PubMed PubMed Central Google Scholar
Albertson, D. G., Collins, C., McCormick, F. & Gray, J. W. Chromosome aberrations in solid tumors. Nature Genet. 34, 369–376 (2003).
CAS PubMed Google Scholar
Rausch, T. et al. Genome sequencing of pediatric medulloblastoma links catastrophic DNA rearrangements with TP53 mutations. Cell 148, 59–71 (2012).
CAS PubMed PubMed Central Google Scholar
Maher, C. A. & Wilson, R. K. Chromothripsis and human disease: piecing together the shattering process. Cell 148, 29–32 (2012).
CAS PubMed PubMed Central Google Scholar
Forment, J. V., Kaidi, A. & Jackson, S. P. Chromothripsis and cancer: causes and consequences of chromosome shattering. Nature Rev. Cancer 12, 663–670 (2012).
CAS Google Scholar
Baca, S. C. et al. Punctuated evolution of prostate cancer genomes. Cell 153, 666–677 (2013).
CAS PubMed PubMed Central Google Scholar
Malhotra, A. et al. Breakpoint profiling of 64 cancer genomes reveals numerous complex rearrangements spawned by homology-independent mechanisms. Genome Res. 23, 762–776 (2013).
CAS PubMed PubMed Central Google Scholar
Sorzano, C. O., Pascual-Montano, A., Sanchez de Diego, A., Martinez, A. C. & van Wely, K. H. Chromothripsis: breakage–fusion–bridge over and over again. Cell Cycle 12, 2016–2023 (2013).
CAS PubMed PubMed Central Google Scholar
Korbel, J. O. & Campbell, P. J. Criteria for inference of chromothripsis in cancer genomes. Cell 152, 1226–1236 (2013).
CAS PubMed Google Scholar
Oesper, L., Ritz, A., Aerni, S. J., Drebin, R. & Raphael, B. J. Reconstructing cancer genomes from paired-end sequencing data. BMC Bioinformatics 13 (Suppl. 6), S10 (2012).
PubMed PubMed Central Google Scholar
Landau, D. A. et al. Evolution and impact of subclonal mutations in chronic lymphocytic leukemia. Cell 152, 714–726 (2013).
CAS PubMed PubMed Central Google Scholar
Keats, J. J. et al. Clonal competition with alternating dominance in multiple myeloma. Blood 120, 1067–1076 (2012).
CAS PubMed PubMed Central Google Scholar
Turke, A. B. et al. Preexistence and clonal selection of MET amplification in EGFR mutant NSCLC. Cancer Cell 17, 77–88 (2010).
CAS PubMed PubMed Central Google Scholar
Parzen, E. On estimation of a probability density function and mode. Ann. Math. Statist. 33, 1065–1076 (1962).
Google Scholar
Rosenblatt, M. Remarks on some non-parametric estimates of a density function. Ann. Math. Statist. 27, 832–837 (1956).
Google Scholar
Carter, S. L. et al. Absolute quantification of somatic DNA alterations in human cancer. Nature Biotech. 30, 413–421 (2012).
CAS Google Scholar
Shah, S. P. et al. The clonal and mutational evolution spectrum of primary triple-negative breast cancers. Nature 486, 395–399 (2012).
CAS PubMed Google Scholar
Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012).
CAS PubMed PubMed Central Google Scholar
Oesper, L., Mahmoody, A. & Raphael, B. J. THetA: inferring intra-tumor heterogeneity from high-throughput DNA sequencing data. Genome Biol. 14, R80 (2013).
PubMed PubMed Central Google Scholar
Gonzalez-Perez, A. et al. Computational approaches to identify functional genetic variants in cancer genomes. Nature Methods 10, 723–729 (2013).
CAS PubMed PubMed Central Google Scholar
Raphael, B. J., Dobson, J. R., Oesper, L. & Vandin, F. Identifying driver mutations in sequenced cancer genomes: computational approaches to enable precision medicine. Genome Med. 6, 5 (2014).
PubMed PubMed Central Google Scholar
Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012).
CAS PubMed PubMed Central Google Scholar
Kolata, G. In Treatment for Leukemia, Glimpses of the Future. The New York Times A1 (7 July 2012).
Lander, E. S. & Waterman, M. S. Genomic mapping by fingerprinting random clones: a mathematical analysis. Genomics 2, 231–239 (1988).
CAS PubMed Google Scholar
Wendl, M. C. & Wilson, R. K. Aspects of coverage in medical DNA sequencing. BMC Bioinformatics 9, 239 (2008).
PubMed PubMed Central Google Scholar
Bashir, A., Volik, S., Collins, C., Bafna, V. & Raphael, B. J. Evaluation of paired-end sequencing strategies for detection of genome rearrangements in cancer. PLoS Comput. Biol. 4, e1000051 (2008).
PubMed PubMed Central Google Scholar
Wendl, M. C. & Wilson, R. K. Statistical aspects of discerning indel-type structural variation via DNA sequence alignment. BMC Genomics 10, 359 (2009).
PubMed PubMed Central Google Scholar
Boffetta, P. & Nyberg, F. Contribution of environmental factors to cancer risk. Br. Med. Bull. 68, 71–94 (2003).
CAS PubMed Google Scholar
Cerwenka, A. & Lanier, L. L. Natural killer cells, viruses and cancer. Nature Rev. Immunol. 1, 41–49 (2001).
CAS Google Scholar
Cancer Genome Atlas Network. Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330–337 (2012).
Stransky, N. et al. The mutational landscape of head and neck squamous cell carcinoma. Science 333, 1157–1160 (2011).
CAS PubMed PubMed Central Google Scholar
Parkin, D. M. The global health burden of infection-associated cancers in the year 2002. Int. J. Cancer 118, 3030–3044 (2006).
CAS PubMed Google Scholar
Kostic, A. D. et al. PathSeq: software to identify or discover microbes by deep sequencing of human tissue. Nature Biotech. 29, 393–396 (2011).
CAS Google Scholar
Bhaduri, A., Qu, K., Lee, C. S., Ungewickell, A. & Khavari, P. A. Rapid identification of non-human sequences in high-throughput sequencing datasets. Bioinformatics 28, 1174–1175 (2012).
CAS PubMed PubMed Central Google Scholar
Layer, R. M., Chiang, C., Quinlan, A. R. & Hall, I. M. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 15, R84 (2014).
PubMed PubMed Central Google Scholar
Van Loo, P. et al. Allele-specific copy number analysis of tumors. Proc. Natl Acad. Sci. USA 107, 16910–16915 (2010).
CAS PubMed Google Scholar
Adzhubei, I. A. et al. A method and server for predicting damaging missense mutations. Nature Methods 7, 248–249 (2010).
CAS PubMed PubMed Central Google Scholar
McLaren, W. et al. Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics 26, 2069–2070 (2010).
CAS PubMed PubMed Central Google Scholar
Tamborero, D., Lopez-Bigas, N. & Gonzalez-Perez, A. Oncodrive-CIS: a method to reveal likely driver genes based on the impact of their copy number changes on expression. PLoS ONE 8, e55489 (2013).
CAS PubMed PubMed Central Google Scholar
Tamborero, D., Gonzalez-Perez, A. & Lopez-Bigas, N. OncodriveCLUST: exploiting the positional clustering of somatic mutations to identify cancer genes. Bioinformatics 29, 2238–2244 (2013).
CAS PubMed Google Scholar

Download references

Acknowledgements

This work was supported by the US National Human Genome Research Institute (grants U01HG006517 to L.D.; R01HG005690 and R01HG007069 to B.J.R.) and by the US National Cancer Institute (grant R01CA180006 to L.D.). The authors thank K. Ye and M. D. McLellan for comments.

Author information

Authors and Affiliations

The Genome Institute, Washington University in St. Louis, 4444 Forest Park Ave., St. Louis, 63108, Missouri, USA
Li Ding, Michael C. Wendl & Joshua F. McMichael
Department of Medicine, Washington University in St. Louis, 660 S. Euclid Ave., St. Louis, 63110, Missouri, USA
Li Ding
Department of Genetics, Washington University in St. Louis, 660 S. Euclid Ave., St. Louis, 63110, Missouri, USA
Li Ding & Michael C. Wendl
Siteman Cancer Center, Washington University in St. Louis, 4921 Parkview Place, St. Louis, 63110, Missouri, USA
Li Ding
Department of Mathematics, Washington University in St. Louis, 1 Brookings Drive, St. Louis, 63130, Missouri, USA
Michael C. Wendl
Department of Computer Science and Center for Computational Molecular Biology, Brown University, 115 Waterman Street, Providence, 02912, Rhode Island, USA
Benjamin J. Raphael

Authors

Li Ding
View author publications
You can also search for this author in PubMed Google Scholar
Michael C. Wendl
View author publications
You can also search for this author in PubMed Google Scholar
Joshua F. McMichael
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin J. Raphael
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Ding.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Glossary

Pyrosequencing: A specific sequencing- by-synthesis method in which detection is based on chemiluminescent signals from luciferin conversion.
Sequencing-by-ligation: A sequencing method based on the mismatch sensitivity of DNA ligase to detect nucleotides.
Sequencing-by-synthesis: A sequencing method that uses sequential polymerization of nucleotides to a template, in which each incorporation is inferred by an imaging process, usually from a fluorescent dye attached to the added nucleotide.
Driver mutations: Somatic mutations that have causal roles in initiation, progression, metastasis or recurrence of cancer.
Significantly mutated genes: (SMGs). Genes with rates of somatic mutations that are higher than the random background rates, which suggests a role in tumour initiation or progression.
Sequence coverage theory: A theory that characterizes sequencing processes mathematically to support development of detection methods, as well as analysis and design of sequencing projects.
Type I errors: Errors made when effects are declared when none actually exists, which lead to false positives.
Type II errors: Errors made when actual effects are overlooked, which lead to false negatives.
Paired-end mapping: Coordinated mapping of both sequenced ends of a fragment to a reference genome, in which the approximately known separation between the two ends provides extra information against misalignments.
Gapped alignment: An alignment process in which small gaps are allowed if they support a better fit.
Split read: The phenomenon in which a read spans a deleted site, whereby the read appears to be split in its alignment to a reference.
De novo assembly: Reconstruction of a genomic target by assessing consensus sequence from alignments of overlapping reads and clones.
Precision: The fraction of the total number of called events that are true.
Passenger mutations: Somatic mutations that arise incidentally and that have no mechanistic role in cancer initiation or progression.
Background mutation rate: (BMR). The rate at which spontaneous mutations occur as a result of uncorrected copying errors.
Kataegis: The appearance of regions of local hypermutations in a tumour genome.
Chromothripsis: A catastrophic mutational event that 'shatters' one or more chromosomes, which leads to simultaneous loss and rearrangement of multiple chromosomal segments.
Chromoplexy: A mutational event that results in substantial and complex rearrangements that involve multiple loci, although it is not as severe as chromothripsis and involves less clustering of rearrangement breakpoints.
Clonal evolution: The emergence of novel clones that have improved survival or propagational fitness according to the particular sets of somatic mutations that have accumulated in these clones.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ding, L., Wendl, M., McMichael, J. et al. Expanding the computational toolbox for mining cancer genomes. Nat Rev Genet 15, 556–570 (2014). https://doi.org/10.1038/nrg3767

Download citation

Published: 08 July 2014
Issue Date: August 2014
DOI: https://doi.org/10.1038/nrg3767

This article is cited by

In Silico Pipeline to Identify Tumor-Specific Antigens for Cancer Immunotherapy Using Exome Sequencing Data
- Diego Morazán-Fernández
- Javier Mora
- Jose Arturo Molina-Mora
Phenomics (2023)
RETRACTED ARTICLE: Comprehensive characterization of tumor mutation burden in clear cell renal cell carcinoma based on the three independent cohorts
- Jing Huang
- Zhou Li
- Lifen Zhang
Journal of Cancer Research and Clinical Oncology (2021)
QuaDMutNetEx: a method for detecting cancer driver genes with low mutation frequency
- Yahya Bokhari
- Areej Alhareeri
- Tomasz Arodz
BMC Bioinformatics (2020)
DNA and RNA sequencing identified a novel oncogene VPS35 in liver hepatocellular carcinoma
- Guiji Zhang
- Xia Tang
- Keyue Ding
Oncogene (2020)
A review on tumor heterogeneity and evolution in multiple myeloma: pathological, radiological, molecular genetics, and clinical integration
- Christian M. Schürch
- Leo Rasche
- Falko Fend
Virchows Archiv (2020)