Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes

Journal name:
Nature Genetics
Year published:
Published online


Cancers exhibit extensive mutational heterogeneity, and the resulting long-tail phenomenon complicates the discovery of genes and pathways that are significantly mutated in cancer. We perform a pan-cancer analysis of mutated networks in 3,281 samples from 12 cancer types from The Cancer Genome Atlas (TCGA) using HotNet2, a new algorithm to find mutated subnetworks that overcomes the limitations of existing single-gene, pathway and network approaches. We identify 16 significantly mutated subnetworks that comprise well-known cancer signaling pathways as well as subnetworks with less characterized roles in cancer, including cohesin, condensin and others. Many of these subnetworks exhibit co-occurring mutations across samples. These subnetworks contain dozens of genes with rare somatic mutations across multiple cancers; many of these genes have additional evidence supporting a role in cancer. By illuminating these rare combinations of mutations, pan-cancer network analyses provide a roadmap to investigate new diagnostic and therapeutic opportunities across cancer types.

At a glance


  1. HotNet2 pan-cancer analysis.
    Figure 1: HotNet2 pan-cancer analysis.

    (a) The pan-cancer mutation data comprise SNVs (nonsynonymous SNVs and small indels) and CNAs (amplifications and deletions) in 19,459 genes in 3,281 samples. (b) Removing hypermutator samples and genes with few RNA-seq reads in all tumor types leaves 11,565 genes in 3,110 samples for analysis, with a wide range in the number of samples having an SNV (x axis) or a CNA (y axis) in these genes. The number of samples with SNVs and/or CNAs is shown for each gene, with points colored by the total. (c) HotNet2 finds significantly mutated subnetworks using a diffusion process on a protein-protein interaction network. Each node (protein) is assigned a score (heat) according to the frequency and significance of SNVs or CNAs in the corresponding gene. Heat diffuses across the edges of a network. Subnetworks containing nodes that both send and receive a significant amount of heat (outlined) are reported. (d) Subnetworks identified by HotNet2 include genes with a wide range of heat scores, including both frequently mutated, known cancer-related genes (hot genes) and rarely mutated genes (cold genes) that are implicated because of their interactions with other cancer types. Thus, HotNet2 delves into the long tail of rarely mutated genes by the analysis of combinations of interacting genes.

  2. Overview of HotNet2 pan-cancer results.
    Figure 2: Overview of HotNet2 pan-cancer results.

    (a) HotNet2 consensus subnetworks are arranged near the cancer types where they are enriched for mutations using a force-directed layout (BLCA, bladder urothelial carcinoma; BRCA, breast invasive carcinoma; COADREAD, colon adenocarcinoma and rectum adenocarcinoma; GBM, glioblastoma multiforme; HNSC, head and neck squamous cell carcinoma; KIRC, kidney renal clear cell carcinoma; LAML, acute myeloid leukemia; LUAD, lung adenocarcinoma; LUSC, lung squamous cell carcinoma; OV, ovarian serous cystadenocarcinoma; UCEC, uterine corpus endometrioid carcinoma). Colored outlines surrounding each network indicate the cancer types that are enriched for mutations (corrected P < 0.05). Interactions between proteins in a subnetwork are derived from the three interaction networks used in our pan-cancer analysis. In the center, there are 13 linkers that are members of more than one consensus subnetwork; lines between linkers and other consensus subnetworks indicate protein-protein interactions between them. Genes with a single asterisk were significant by exactly one of GISTIC2, MuSiC, MutSigCV, Oncodrive or the list of driver genes in ref. 9, whereas genes with two asterisks were not reported by any of these methods. (b) Heat map of significant co-occurrence (yellow; lower triangle) and mutual exclusivity (blue; upper triangle) of mutations across all pan-cancer samples in the most frequently mutated HotNet2 pan-cancer consensus and condensin subnetworks (P < 0.01, Cochran-Mantel-Haenszel test). Black outlines indicate pairs of subnetworks that have P < 0.05 after multiple-hypothesis correction. (c) Mutual exclusivity and co-occurrence (P < 0.01, Fisher's exact test) within individual cancer types using the same color scheme as in a.

  3. HotNet2 pan-cancer subnetworks overlapping SWI/SNF and BAP1 complexes.
    Figure 3: HotNet2 pan-cancer subnetworks overlapping SWI/SNF and BAP1 complexes.

    (a) Subnetwork containing members of the SWI/SNF complex, including the BAF proteins ARID1A and ARID1B, the PBAF proteins PBRM1 and ARID2, the catalytic core member SMARCA4, SMARCB1 and ADNP. (a) Top, mutation matrix showing the samples (colored by cancer type) with a mutation of the indicated type: full ticks represent SNVs, indels and splice-site mutations; upticks and downticks represent amplifications and deletions, respectively. A black dot corresponds to samples with an inactivating mutation in the gene and indicates that the genes contain at least one of the following mutations: nonsense, frameshift indel, nonstop or splice site. The number of samples with mutations in a gene is given in parentheses; marks are defined as in Figure 2a. Bottom left, interactions between proteins in the subnetwork from each interaction network are colored according to mutually enriched cancer type with corresponding P values. PPI, protein-protein interaction. Bottom right, the PBRM1 protein sequence exhibited significant clustering of missense alterations (P = 1.6 × 10−5) in a 105-amino-acid bromodomain, a region that was reported to be mutated in a different renal clear cell carcinoma cohort39 but not in the TCGA KIRC publication3. Splice-site mutations are annotated with the nearest exon, where +1 and +2 refer to the adjacent 3′ splice donor and −1 and −2 refer to the adjacent 5′ splice acceptor. (b) Subnetwork containing members of the BAP1 complex, including the core PR-DUB complex, comprised of the deubiquitinating enzyme BAP1 and the Polycomb group proteins ASXL1 and ASXL2, as well as the BAP1-interacting proteins: ANKRD17, FOXK1, FOXK2 and KDM1B. Colors, marks and panel organization are as in a. (c) Inactivating mutations across samples (columns) in the SWI/SNF and BAP1 complexes (rows) in KIRC. The bottom row shows the mRNA expression classification of each sample3. The mutations in these complexes are surprisingly mutually exclusive in KIRC (P < 3.6 × 10−4, Fisher's exact test, corrected), and the BAP1 complex is significantly enriched in mutations in the third expression subtype (P < 3.4 × 10−8, Fisher's exact test).

  4. HotNet2 pan-cancer subnetworks overlapping the cohesin and condensin complexes.
    Figure 4: HotNet2 pan-cancer subnetworks overlapping the cohesin and condensin complexes.

    (a) Cohesin consensus subnetwork and its mutations. Colors and marks are as defined in Figure 3a. None of the genes are mutated in more than 1.9% of the samples, but the subnetwork is mutated in >4% of the samples in each cancer type. STAG1 exhibits significant (P < 6 × 10−5) clustering of missense alterations across 135 residues (highlighted) in the Pfam-B domain (PFAM, PB002581), a pattern suggesting inactivation of the corresponding domain. (b) Condensin consensus subnetwork and its mutations. Top, mutation matrix showing five genes in the condensin I and condensin II complexes. Only one gene, SMC4, was significant by individual gene scores. Bottom left, a subnetwork consisting of NCAPD2 and SMC4, both members of condensin I, was significantly mutated in BLCA, and a subnetwork consisting of NCAPD3, NCAPG2 and NCAPH2, all members of condensin II, was significantly mutated in LUAD and LUSC. At the gene level, NCAPD2 was significantly mutated in BLCA, SMC4 was significantly mutated in BLCA and HNSC, NCAPD3 was significantly mutated in LUAD and NCAPG2 was significantly mutated in LUSC. Bottom right, NCAPH2 shows a significant (P < 2.6 × 10−4) cluster of missense alterations between Arg551 and Ser556.


  1. Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature 490, 6170 (2012).
  2. Cancer Genome Atlas Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455, 10611068 (2008).
  3. Cancer Genome Atlas Research Network. Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature 99, 4349 (2013).
  4. Cancer Genome Atlas Network. Integrated genomic analyses of ovarian carcinoma. Nature 474, 609615 (2011).
  5. Cancer Genome Atlas Research Network. Comprehensive genomic characterization of squamous cell lung cancers. Nature 489, 519525 (2012).
  6. Kandoth, C. et al. Integrated genomic characterization of endometrial carcinoma. Nature 497, 6773 (2013).
  7. Cancer Genome Atlas Network. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N. Engl. J. Med. 368, 20592074 (2013).
  8. Stratton, M.R., Campbell, P.J. & Futreal, P.A. The cancer genome. Nature 458, 719724 (2009).
  9. Vogelstein, B. et al. Cancer genome landscapes. Science 339, 15461558 (2013).
  10. Garraway, L.A. & Lander, E.S. Lessons from the cancer genome. Cell 153, 1737 (2013).
  11. Lawrence, M.S. et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505, 495501 (2014).
  12. Kandoth, C. et al. Mutational landscape and significance across 12 major cancer types. Nature 502, 333339 (2013).
  13. Zack, T.I. et al. Pan-cancer patterns of somatic copy number alteration. Nat. Genet. 45, 11341140 (2013).
  14. Weinstein, J.N. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 11131120 (2013).
  15. Hanahan, D. & Weinberg, R.a. Hallmarks of cancer: the next generation. Cell 144, 646674 (2011).
  16. Vandin, F., Upfal, E. & Raphael, B.J. Algorithms for detecting significantly mutated pathways in cancer. J. Comput. Biol. 18, 507522 (2011).
  17. Vandin, F., Clay, P., Upfal, E. & Raphael, B.J. Discovery of mutated subnetworks associated with clinical data in cancer. Pac. Symp. Biocomput. 2012, 5566 (2012).
  18. Grasso, C.S. et al. The mutational landscape of lethal castration-resistant prostate cancer. Nature 487, 239243 (2012).
  19. Hofree, M., Shen, J.P., Carter, H., Gross, A. & Ideker, T. Network-based stratification of tumor mutations. Nat. Methods 10, 11081115 (2013).
  20. Lawrence, M.S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214218 (2013).
  21. Das, J. & Yu, H. HINT: high-quality protein interactomes and their applications in understanding human disease. BMC Syst. Biol. 6, 92 (2012).
  22. Yu, H. et al. Next-generation sequencing to generate interactome datasets. Nat. Methods 8, 478480 (2011).
  23. Khurana, E., Fu, Y., Chen, J. & Gerstein, M. Interpretation of genomic variants using a unified biological network approach. PLOS Comput. Biol. 9, e1002886 (2013).
  24. Razick, S., Magklaras, G. & Donaldson, I.M. iRefIndex: a consolidated protein interaction database with provenance. BMC Bioinformatics 9, 405 (2008).
  25. Hoadley, K.A. et al. Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin. Cell 158, 929944 (2014).
  26. Gonzalez-Perez, A. & Lopez-Bigas, N. Functional impact bias reveals cancer drivers. Nucleic Acids Res. 40, e169 (2012).
  27. Tamborero, D., Lopez-Bigas, N. & Gonzalez-Perez, A. Oncodrive-CIS: a method to reveal likely driver genes based on the impact of their copy number changes on expression. PLoS ONE 8, e55489 (2013).
  28. Dees, N.D. et al. MuSiC: identifying mutational significance in cancer genomes. Genome Res. 22, 15891598 (2012).
  29. Mermel, C.H. et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 12, R41 (2011).
  30. Ye, J., Pavlicek, A., Lunney, E.A., Rejto, P.A. & Teng, C.-H. Statistical method on nonrandom clustering with application to somatic mutations in cancer. BMC Bioinformatics 11, 11 (2010).
  31. Ryslik, G.A., Cheng, Y., Cheung, K.-H., Modis, Y. & Zhao, H. Utilizing protein structure to identify non-random somatic mutations. BMC Bioinformatics 14, 190 (2013).
  32. Yeang, C.-H., McCormick, F. & Levine, A. Combinatorial patterns of somatic gene mutations in cancer. FASEB J. 22, 26052622 (2008).
  33. Vandin, F., Upfal, E. & Raphael, B.J. De novo discovery of mutated driver pathways in cancer. Genome Res. 22, 375385 (2012).
  34. Solis, L.M. et al. Nrf2 and Keap1 abnormalities in non–small cell lung carcinoma and association with clinicopathologic features. Clin. Cancer Res. 16, 37433753 (2010).
  35. Yamadori, T. et al. Molecular mechanisms for the regulation of Nrf2-mediated cell proliferation in non-small-cell lung cancers. Oncogene 31, 47684777 (2012).
  36. Thompson, B.A., Tremblay, V., Lin, G. & Bochar, D.A. CHD8 is an ATP-dependent chromatin remodeling factor that regulates β-catenin target genes. Mol. Cell. Biol. 28, 38943904 (2008).
  37. Greife, A. et al. Canonical Notch signalling is inactive in urothelial carcinoma. BMC Cancer 14, 628 (2014).
  38. Wilson, B.G. & Roberts, C.W.M. SWI/SNF nucleosome remodellers and cancer. Nat. Rev. Cancer 11, 481492 (2011).
  39. Varela, I. et al. Exome sequencing identifies frequent mutation of the SWI/SNF complex gene PBRM1 in renal carcinoma. Nature 469, 539542 (2011).
  40. Kadoch, C. et al. Proteomic and bioinformatic analysis of mammalian SWI/SNF complexes identifies extensive roles in human malignancy. Nat. Genet. 45, 592601 (2013).
  41. Sausen, M. et al. Integrated genomic analyses identify ARID1A and ARID1B alterations in the childhood cancer neuroblastoma. Nat. Genet. 45, 1217 (2013).
  42. Tsurusaki, Y. et al. Mutations affecting components of the SWI/SNF complex cause Coffin-Siris syndrome. Nat. Genet. 44, 376378 (2012).
  43. Mandel, S. & Gozes, I. Activity-dependent neuroprotective protein constitutes a novel element in the SWI/SNF chromatin remodeling complex. J. Biol. Chem. 282, 3444834456 (2007).
  44. Steingart, R.A. & Gozes, I. Recombinant activity-dependent neuroprotective protein protects cells against oxidative stress. Mol. Cell. Endocrinol. 252, 148153 (2006).
  45. Carbone, M. et al. BAP1 and cancer. Nat. Rev. Cancer 13, 153159 (2013).
  46. Peña-Llopis, S. et al. BAP1 loss defines a new class of renal cell carcinoma. Nat. Genet. 44, 751759 (2012).
  47. Fang, R. et al. Human LSD2/KDM1b/AOF1 regulates gene transcription by modulating intragenic H3K4me2 methylation. Mol. Cell 39, 222233 (2010).
  48. Shi, Y. et al. Histone demethylation mediated by the nuclear amine oxidase homolog LSD1. Cell 119, 941953 (2004).
  49. Xu, H., Tomaszewski, J.M. & McKay, M.J. Can corruption of chromosome cohesion create a conduit to cancer? Nat. Rev. Cancer 11, 199210 (2011).
  50. Rubio, E.D. et al. CTCF physically links cohesin to chromatin. Proc. Natl. Acad. Sci. USA 105, 83098314 (2008).
  51. Schmidt, D. et al. A CTCF-independent role for cohesin in tissue-specific transcription. Genome Res. 20, 578588 (2010).
  52. Kon, A. et al. Recurrent mutations in multiple components of the cohesin complex in myeloid neoplasms. Nat. Genet. 45, 12321237 (2013).
  53. Solomon, D.A. et al. Frequent truncating mutations of STAG2 in bladder cancer. Nat. Genet. 45, 14281430 (2013).
  54. Wood, A.J., Severson, A.F. & Meyer, B.J. Condensin and cohesin complexity: the expanding repertoire of functions. Nat. Rev. Genet. 11, 391404 (2010).
  55. Hirano, T. Condensins: universal organizers of chromosomes with diverse functions. Genes Dev. 26, 16591678 (2012).
  56. Lapointe, J. et al. hCAP-D3 expression marks a prostate cancer subtype with favorable clinical behavior and androgen signaling signature. Am. J. Surg. Pathol. 32, 205209 (2008).
  57. Ciriello, G. et al. Emerging landscape of oncogenic signatures across human cancers. Nat. Genet. 45, 11271133 (2013).
  58. Mitra, K., Carvunis, A.-R., Ramesh, S.K. & Ideker, T. Integrative approaches for finding modular structure in biological networks. Nat. Rev. Genet. 14, 719732 (2013).
  59. Chung, F. The heat kernel as the pagerank of a graph. Proc. Natl. Acad. Sci. USA 104, 1973519740 (2007).
  60. Berkhin, P. Bookmark-Coloring algorithm for personalized PageRank computing. Internet Math. 3, 4162 (2006).
  61. Huang, W., Sherman, B.T. & Lempicki, R.A. Systematic and integrative analysis of large gene lists using DAVID Bioinformatics Resources. Nat. Protoc. 4, 4457 (2009).
  62. Huang, W., Sherman, B.T. & Lempicki, R.A. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 37, 113 (2009).
  63. Mootha, V.K. et al. PGC-1α–responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat. Genet. 34, 267273 (2003).
  64. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102, 1554515550 (2005).
  65. Ciriello, G., Cerami, E.G., Sander, C. & Schultz, N. Mutual exclusivity analysis identifies oncogenic network modules. Genome Res. 22, 398406 (2012).
  66. Shay, J.W., Zou, Y., Hiyama, E. & Wright, W.E. Telomerase and cancer. Hum. Mol. Genet. 10, 677685 (2001).

Download references

Author information

  1. Present address: Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark.

    • Fabio Vandin
  2. These authors contributed equally to this work.

    • Mark D M Leiserson &
    • Fabio Vandin


  1. Department of Computer Science, Brown University, Providence, Rhode Island, USA.

    • Mark D M Leiserson,
    • Fabio Vandin,
    • Hsin-Ta Wu,
    • Jason R Dobson,
    • Jonathan V Eldridge,
    • Jacob L Thomas,
    • Alexandra Papoutsaki,
    • Younhun Kim &
    • Benjamin J Raphael
  2. Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, USA.

    • Mark D M Leiserson,
    • Fabio Vandin,
    • Hsin-Ta Wu,
    • Jason R Dobson &
    • Benjamin J Raphael
  3. Department of Molecular Biology, Cell Biology and Biochemistry, Brown University, Providence, Rhode Island, USA.

    • Jason R Dobson
  4. Genome Institute, Washington University in St. Louis, St. Louis, Missouri, USA.

    • Beifang Niu,
    • Michael McLellan &
    • Li Ding
  5. Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.

    • Michael S Lawrence &
    • Gad Getz
  6. Research Unit on Biomedical Informatics, Department of Experimental and Health Sciences, University Pompeu Fabra, Barcelona, Spain.

    • Abel Gonzalez-Perez,
    • David Tamborero &
    • Nuria Lopez-Bigas
  7. Program of Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, USA.

    • Yuwei Cheng
  8. Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, USA.

    • Gregory A Ryslik
  9. Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain.

    • Nuria Lopez-Bigas
  10. Department of Pathology, Massachusetts General Hospital, Boston, Massachusetts, USA.

    • Gad Getz
  11. Department of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA.

    • Li Ding
  12. Siteman Cancer Center, Washington University in St. Louis, St. Louis, Missouri, USA.

    • Li Ding


M.D.M.L., F.V., H.-T.W. and B.J.R. designed the HotNet2 algorithm. M.D.M.L., F.V., H.-T.W., J.R.D., J.V.E., J.L.T., Y.K. and B.J.R. performed pan-cancer network analysis, analyzed results and benchmarked algorithms. A.P., J.R.D., Y.C. and G.A.R. analyzed mutation clusters in genes. B.N., M.M. and L.D. provided MuSiC gene scores, assisted with figures and generated mutation validation data. M.S.L., G.G., A.G.-P., D.T. and N.L.-B. provided MutSigCV and Oncodrive gene scores. M.D.M.L., F.V., H.-T.W., J.R.D. and B.J.R. wrote the manuscript with input from all authors. B.J.R. conceived and supervised the project.

Competing financial interests

A patent application related to this work has been filed with the US Patent and Trademark Office (USPTO).

Corresponding author

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (14,625 KB)

    Supplementary Note and Supplementary Figures 1–30.

Excel files

  1. Supplementary Tables 1–23 and 25–39 (224 KB)

    Supplementary Tables 1–23 and 25–39.

  2. Supplementary Table 24 (373 KB)

    Mutually exclusive and co-occurring test for pairwise genes within the pair of HotNet2 identified subnetworks across all pan-cancer samples.

Additional data