Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Integration of multiomics data with graph convolutional networks to identify new cancer genes and their associated molecular mechanisms

Abstract

The increase in available high-throughput molecular data creates computational challenges for the identification of cancer genes. Genetic as well as non-genetic causes contribute to tumorigenesis, and this necessitates the development of predictive models to effectively integrate different data modalities while being interpretable. We introduce EMOGI, an explainable machine learning method based on graph convolutional networks to predict cancer genes by combining multiomics pan-cancer data—such as mutations, copy number changes, DNA methylation and gene expression—together with protein–protein interaction (PPI) networks. EMOGI was on average more accurate than other methods across different PPI networks and datasets. We used layer-wise relevance propagation to stratify genes according to whether their classification was driven by the interactome or any of the omics levels, and to identify important modules in the PPI network. We propose 165 novel cancer genes that do not necessarily harbour recurrent alterations but interact with known cancer genes, and we show that they correspond to essential genes from loss-of-function screens. We believe that our method can open new avenues in precision oncology and be applied to predict biomarkers for other complex diseases.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Schematic of the EMOGI framework.
Fig. 2: EMOGI outperforms previous methods in predicting cancer genes and benefits from both, multiomics and network features.
Fig. 3: Model explanation of well-known cancer genes recapitulates their oncogenic molecular mechanisms.
Fig. 4: NPCGs interact with KCGs and are more essential in tumour cell lines.
Fig. 5: Biclustering of genes and feature contributions reveals distinct classes of cancer genes with unique functional characteristics.
Fig. 6: EMOGI allows extraction of PPI network components corresponding to subnetworks important for cancer gene classification.

Data availability

All datasets used in this study are publicly available or available for research organization and listed in Supplementary Section 2.7. The github repository (https://github.com/schulter/EMOGI) contains manifest files that can be used to download TCGA data using the GDC Data Transfer Tool.

Code availability

The source code to train the EMOGI model and reproduce the results is available at https://github.com/schulter/EMOGI (ref. 95) and a compute capsule is available96. The trained multiomics models for all six PPI networks can be downloaded from https://owww.molgen.mpg.de/sasse/EMOGI/.

References

  1. 1.

    Garraway, L. A. & Lander, E. S. Lessons from the cancer genome. Cell 153, 17–37 (2013).

    Google Scholar 

  2. 2.

    Lawrence, M. S. et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505, 495–501 (2014).

    Google Scholar 

  3. 3.

    Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).

    Google Scholar 

  4. 4.

    Vogelstein, B. et al. Cancer genome landscapes. Science 340, 1546–1558 (2013).

    Google Scholar 

  5. 5.

    Zhang, J. et al. International cancer genome consortium data portal-a one-stop shop for cancer genomics data. Database 2011, bar026 (2011).

    Google Scholar 

  6. 6.

    Cancer Genome Atlas Research Network, J. N. et al. The cancer genome atlas pan-cancer analysis project. Nat. Genet. 45, 1113–20 (2013).

    Google Scholar 

  7. 7.

    Campbell, P. J. et al. Pan-cancer analysis of whole genomes. Nature 578, 82–93 (2020).

    Google Scholar 

  8. 8.

    Repana, D. et al. The network of cancer genes (NCG): a comprehensive catalogue of known and candidate cancer genes from cancer sequencing screens. Genome Biol. 20, 1–12 (2019).

    Google Scholar 

  9. 9.

    Sondka, Z. et al. The COSMIC cancer gene census: describing genetic dysfunction across all human cancers. Nat. Rev. Cancer 18, 696–705 (2018).

    Google Scholar 

  10. 10.

    Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).

    Google Scholar 

  11. 11.

    Leiserson, M. D. et al. Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes. Nat. Genet. 47, 106–114 (2015).

    Google Scholar 

  12. 12.

    Silverbush, D. et al. Simultaneous integration of multi-omics data improves the identification of cancer driver modules. Cell Syst. 8, 456–466.e5 (2019).

    Google Scholar 

  13. 13.

    Bailey, M. H. et al. Comprehensive characterization of cancer driver genes and mutations. Cell 173, 371–385.e18 (2018).

    Google Scholar 

  14. 14.

    Tokheim, C. J., Papadopoulos, N., Kinzler, K. W., Vogelstein, B. & Karchin, R. Evaluating the evaluation of cancer driver genes. Proc. Natl Acad. Sci. USA 113, 14330–14335 (2016).

    Google Scholar 

  15. 15.

    Bell, C. C. & Gilan, O. Principles and mechanisms of non-genetic resistance in cancer. Brit. J. Cancer 122, 465–472 (2019).

    Google Scholar 

  16. 16.

    Bradner, J. E., Hnisz, D. & Young, R. A. Transcriptional addiction in cancer. Cell 168, 629–643 (2017).

    Google Scholar 

  17. 17.

    Baylin, S. B. & Jones, P. A. Epigenetic determinants of cancer. Cold Spring Harb. Perspect. Biol. 8, a019505 (2016).

    Google Scholar 

  18. 18.

    Gazzoli, I., Loda, M., Garber, J., Syngal, S. & Kolodner, R. D. A hereditary nonpolyposis colorectal carcinoma case associated with hypermethylation of the MLH1 gene in normal tissue and loss of heterozygosity of the unmethylated allele in the resulting microsatellite instability-high tumor. Cancer Res. 62, 3925–3928 (2002).

    Google Scholar 

  19. 19.

    Poi, M. J., Knobloch, T. J. & Li, J. Deletion of RDINK4/ARF enhancer: a novel mutation to ‘inactivate’ the INK4-ARF locus. DNA Repair 57, 50–55 (2017).

    Google Scholar 

  20. 20.

    Huang, F. W. et al. Highly recurrent TERT promoter mutations in human melanoma. Science 339, 957–959 (2013).

    Google Scholar 

  21. 21.

    Beroukhim, R. et al. The landscape of somatic copy-number alteration across human cancers. Nature 463, 899–905 (2010).

    Google Scholar 

  22. 22.

    Dang, C. V. MYC on the path to cancer. Cell 149, 22–35 (2012).

    Google Scholar 

  23. 23.

    Schuijers, J. et al. Transcriptional dysregulation of MYC reveals common enhancer-docking mechanism. Cell Rep. 23, 349–360 (2018).

    Google Scholar 

  24. 24.

    Cowen, L., Ideker, T., Raphael, B. J. & Sharan, R. Network propagation: a universal amplifier of genetic associations. Nat. Rev. Genet. 18, 551–562 (2017).

    Google Scholar 

  25. 25.

    Reyna, M. A., Leiserson, M. D. & Raphael, B. J. Hierarchial HotNet: identifying hierarchies of altered subnetworks. Bioinformatics 34, i972–i980 (2018).

    Google Scholar 

  26. 26.

    Rappoport, N. & Shamir, R. Multi-omic and multi-view clustering algorithms: Review and cancer benchmark. Nucl. Acids Res. 46, 10546–10562 (2018).

    Google Scholar 

  27. 27.

    Collier, O., Stoven, V. & Vert, J.-P. LOTUS: a single- and multitask machine learning algorithm for the prediction of cancer driver genes. PLoS Comput. Biol. 15, e1007381 (2019).

    Google Scholar 

  28. 28.

    Eraslan, G., Avsec, Ž., Gagneur, J. & Theis, F. J. Deep learning: new computational modelling techniques for genomics. Nat. Rev. Genet. 20, 389–403 (2019).

    Google Scholar 

  29. 29.

    Bruna, J., Zaremba, W., Szlam, A. & LeCun, Y. Spectral networks and locally connected networks on graphs. In International Conference on Learning Representations 2014 (OpenReview, 2013).

  30. 30.

    Perozzi, B., Al-Rfou, R. & Skiena, S. DeepWalk: online learning of social representations. In Proc. 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 701–710 (ACM, 2014).

  31. 31.

    Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations 2017 1–10 (OpenReview, 2016)..

  32. 32.

    Bach, S. et al. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE 10, 1–46 (2015).

    Google Scholar 

  33. 33.

    Gilpin, L. H. et al. Explaining explanations: an overview of interpretability of machine learning. In Proc. 2018 IEEE 5th International Conference on Data Science and Advanced Analytics 80–89 (IEEE, 2019).

  34. 34.

    Jamieson, C. Bad blood promotes tumour progression. Nature 549, 465–466 (2017).

    Google Scholar 

  35. 35.

    Patani, H. et al. Transition to naïve human pluripotency mirrors pan-cancer DNA hypermethylation. Nat. Commun. 11, 1–17 (2020).

    Google Scholar 

  36. 36.

    Page, L., Brin, S., Motwani, R. & Winograd, T. The PageRank Citation Ranking: Bringing Order to the Web (Stanford Univ. InfoLab, 1998).

  37. 37.

    Chakravarty, D. et al. OncoKB: a precision oncology knowledge base. JCO Precis. Oncol. 1, 1–16 (2017).

    Google Scholar 

  38. 38.

    Liu, Y., Sun, J. & Zhao, M. ONGene: a literature-based database for human oncogenes. J. Genet. Genom. 44, 119–121 (2017).

    Google Scholar 

  39. 39.

    Fodde, R. The APC gene in colorectal cancer. Eur. J. Cancer 38, 867–871 (2002).

    Google Scholar 

  40. 40.

    Khan, M. A., Chen, H. C., Zhang, D. & Fu, J. Twist: a molecular target in cancer therapeutics. Tumor Biol. 34, 2497–2506 (2013).

    Google Scholar 

  41. 41.

    Patwardhan, D., Mani, S., Passemard, S., Gressens, P. & El Ghouzzi, V. STIL balancing primary microcephaly and cancer. Cell Death Dis. 9, 65 (2018).

    Google Scholar 

  42. 42.

    Jinesh, G. G., Sambandam, V., Vijayaraghavan, S., Balaji, K. & Mukherjee, S. Molecular genetics and cellular events of K-Ras-driven tumorigenesis. Oncogene 37, 839–846 (2018).

    Google Scholar 

  43. 43.

    Chen, H. Z., Tsai, S. Y. & Leone, G. Emerging roles of E2Fs in cancer: an exit from cell cycle control. Nat. Rev. Cancer 9, 785–797 (2009).

    Google Scholar 

  44. 44.

    Nevins, J. R. The Rb/E2F pathway and cancer. Human Mol. Genet. 10, 699–703 (2001).

    Google Scholar 

  45. 45.

    Li, Y. & Seto, E. HDACs and HDAC inhibitors in cancer development and therapy. Cold Spring Harb. Perspect. Med. https://doi.org/10.1101/cshperspect.a026831 (2016).

  46. 46.

    Luo, R. X., Postigo, A. A. & Dean, D. C. Rb interacts with histone deacetylase to repress transcription. Cell 92, 463–473 (1998).

    Google Scholar 

  47. 47.

    Tsherniak, A. et al. Defining a cancer dependency map. Cell 170, 564–576.e16 (2017).

    Google Scholar 

  48. 48.

    Kluger, Y., Basri, R., Chang, J. T. & Gerstein, M. Spectral biclustering of microarray data: coclustering genes and conditions. Genome Res. 13, 703–716 (2003).

    Google Scholar 

  49. 49.

    Suvà, M. L., Riggi, N. & Bernstein, B. E. Epigenetic reprogramming in cancer. Science 340, 1567–1570 (2013).

    Google Scholar 

  50. 50.

    Keita, M. et al. Global methylation profiling in serous ovarian cancer is indicative for distinct aberrant DNA methylation signatures associated with tumor aggressiveness and disease progression. Gynecol. Oncol. 128, 356–363 (2013).

    Google Scholar 

  51. 51.

    Webber, B. R. et al. DNA methylation of Runx1 regulatory regions correlates with transition from primitive to definitive hematopoietic potential in vitro and in vivo. Blood 122, 2978–2986 (2013).

    Google Scholar 

  52. 52.

    Bissell, M. J. & Hines, W. C. Why don’t we get more cancer? A proposed role of the microenvironment in restraining cancer progression. Nat. Med. 17, 320–329 (2011).

    Google Scholar 

  53. 53.

    Yu, Y. et al. The inhibitory effects of COL1A2 on colorectal cancer cell proliferation, migration, and invasion. J. Cancer 9, 2953–2962 (2018).

    Google Scholar 

  54. 54.

    Sigismund, S., Avanzato, D. & Lanzetti, L. Emerging functions of the EGFR in cancer. Mol. Oncol. 12, 3–20 (2018).

    Google Scholar 

  55. 55.

    Oh, E.-S., Seiki, M., Gotte, M. & Chung, J. Cell adhesion in cancer. Int. J. Cell Biol. 2012, 965618 (2012).

  56. 56.

    Xing, P. et al. Roles of low-density lipoprotein receptor-related protein 1 in tumors. Chinese J. Cancer https://doi.org/10.1186/s40880-015-0064-0 (2016).

  57. 57.

    Pu, X. et al. Caspase-3 and caspase-8 expression in breast cancer: caspase-3 is associated with survival. Apoptosis 22, 357–368 (2017).

    Google Scholar 

  58. 58.

    Schramek, D. et al. Direct in vivo RNAi screen unveils myosin IIa as a tumor suppressor of squamous cell carcinomas. Science 343, 309–313 (2014).

    Google Scholar 

  59. 59.

    Wang, B. et al. MYH9 Promotes growth and metastasis via activation of MAPK/AKT signaling in colorectal cancer. J. Cancer 10, 874–884 (2019).

    Google Scholar 

  60. 60.

    Chen, R., Zhao, W. Q., Fang, C., Yang, X. & Ji, M. Histone methyltransferase SETD2: a potential tumor suppressor in solid cancers. J. Cancer 11, 3349–3356 (2020).

    Google Scholar 

  61. 61.

    Klink, B. U., Gatsogiannis, C., Hofnagel, O., Wittinghofer, A. & Raunser, S. Structure of the human BBSome core complex. eLife 9, e53910 (2020).

  62. 62.

    Yang, K. et al. Integrative analysis reveals CRHBP inhibits renal cell carcinoma progression by regulating inflammation and apoptosis. Cancer Gene Ther. 27, 607–618 (2020).

    Google Scholar 

  63. 63.

    Deng, L., Meng, T., Chen, L., Wei, W. & Wang, P. The role of ubiquitination in tumorigenesis and targeted drug discovery. Signal Transduct. Target. Ther. 5, 11 (2020).

  64. 64.

    Li, Y., Lu, W., He, X., Schwartz, A. L. & Bu, G. LRP6 expression promotes cancer cell proliferation and tumorigenesis by altering β-catenin subcellular distribution. Oncogene 23, 9129–9135 (2004).

    Google Scholar 

  65. 65.

    Ding, Y. et al. Caprin-2 enhances canonical Wnt signaling through regulating LRP5/6 phosphorylation. J. Cell Biol. 182, 865–872 (2008).

    Google Scholar 

  66. 66.

    Tombran-Tink, J. & Barnstable, C. J. PEDF: A multifaceted neurotrophic factor. Nat. Rev. Neurosci. 4, 628–636 (2003).

    Google Scholar 

  67. 67.

    Lytle, N. K., Barber, A. G. & Reya, T. Stem cell fate in cancer growth, progression and therapy resistance. Nat. Rev. Cancer 18, 669–680 (2018).

    Google Scholar 

  68. 68.

    Schaefer, M. H., Serrano, L. & Andrade-Navarro, M. A. Correcting for the study bias associated with protein–protein interaction measurements reveals differences between protein degree distributions from different cancer types. Front. Genet. 6, 00260 (2015).

  69. 69.

    Mourikis, T. P. et al. Patient-specific cancer genes contribute to recurrently perturbed pathways and establish therapeutic vulnerabilities in esophageal adenocarcinoma. Nat. Commun. 10, 3101 (2019).

    Google Scholar 

  70. 70.

    Shi, J. et al. YWHAZ promotes ovarian cancer metastasis by modulating glycolysis. Oncol. Rep. 41, 1101–1112 (2019).

    Google Scholar 

  71. 71.

    Vellingiri, B. et al. Understanding the role of the transcription factor sp1 in ovarian cancer: from theory to practice. Int. J. Mol. Sci. 21, 1153 (2020).

  72. 72.

    Wee, Y., Liu, Y., Lu, J., Li, X. & Zhao, M. Identification of novel prognosis-related genes associated with cancer using integrative network analysis. Sci. Rep. 8, 3233 (2018).

  73. 73.

    Priestley, P. et al. Pan-cancer whole-genome analyses of metastatic solid tumours. Nature 575, 210–216 (2019).

    Google Scholar 

  74. 74.

    Wang, Q. et al. Data descriptor: unifying cancer and normal RNA sequencing data from different sources. Sci. Data 5, 1–8 (2018).

    Google Scholar 

  75. 75.

    Mermel, C. H. et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 12, R41 (2011).

    Google Scholar 

  76. 76.

    Frankish, A. et al. GENCODE reference annotation for the human and mouse genomes. Nucl. Acids Res. 47, D766–D773 (2019).

    Google Scholar 

  77. 77.

    Johnson, W. E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118–127 (2007).

    MATH  Google Scholar 

  78. 78.

    Kamburov, A. et al. ConsensusPathDB: toward a more complete picture of cell biology. Nucl. Acids Res. 39, D712–D717 (2011).

    Google Scholar 

  79. 79.

    Szklarczyk, D. et al. STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucl. Acids Res. 47, D607–D613 (2019).

    Google Scholar 

  80. 80.

    Razick, S., Magklaras, G. & Donaldson, I. M. iRefIndex: a consolidated protein interaction database with provenance. BMC Bioinformatics 9, 405 (2008).

    Google Scholar 

  81. 81.

    Khurana, E., Fu, Y., Chen, J. & Gerstein, M. Interpretation of genomic variants using a unified biological network approach. PLoS Comput. Biol. 9, e1002886 (2013).

    Google Scholar 

  82. 82.

    Huang, J. K. et al. Systematic evaluation of molecular networks for discovery of disease genes. Cell Syst. 6, 484–495.e5 (2018).

    Google Scholar 

  83. 83.

    Kim, J. & et al. DigSee: disease gene search engine with evidence sentences (version cancer). Nucl. Acids Res. 41, W510–W517 (2013).

    Google Scholar 

  84. 84.

    Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucl. Acids Res. 28, 27–30 (2000).

    Google Scholar 

  85. 85.

    McKusick, V. A. Mendelian inheritance in man and its online version, OMIM. Am. J. Human Genet. 80, 588–604 (2007).

    Google Scholar 

  86. 86.

    Liberzon, A. et al. The molecular signatures database hallmark gene set collection. Cell Syst. 1, 417–425 (2015).

    Google Scholar 

  87. 87.

    Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).

    Google Scholar 

  88. 88.

    Niepert, M., Ahmed, M. & Kutzkov, K. Learning Convolutional Neural Networks for Graphs. In International Conference on Learning Representations (ICLR, 2016).

  89. 89.

    Defferrard, M., Bresson, X. & Vandergheynst, P. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems 29 1–14 (NeurIPS, 2016).

  90. 90.

    Li, Q., Han, Z. & Wu, X.-M. Deeper insights into graph convolutional networks for semi-supervised learning. Preprint at https://arxiv.org/abs/1801.07606 (2018).

  91. 91.

    Shindjalova, R., Prodanova, K. & Svechtarov, V. Modeling data for tilted implants in grafted with bio-oss maxillary sinuses using logistic regression. In AIP Conference Proceedings Vol. 1631, 58–62 (2014).

  92. 92.

    Liu, S. H. et al. DriverDBv3: a multi-omics database for cancer driver gene research. Nucl. Acids Res. 48, D863–D870 (2020).

    Google Scholar 

  93. 93.

    Lapuschkin, S. et al. Unmasking clever hans predictors and assessing what machines really learn. Nat. Commun. 10, 1096 (2019).

    Google Scholar 

  94. 94.

    Tarjan, R. Depth-first search and linear graph algorithms. SIAM J. Comput. 1, 146–160 (1972).

    MathSciNet  MATH  Google Scholar 

  95. 95.

    Schulte-Sasse, R. EMOGI Code Release (Zenodo, 2021).

  96. 96.

    Schulte-Sasse, R., Budach, S., Hnisz, D. & Marsico, A. EMOGI—Integration of Multi-Omics Data with Graph Convolutional Networks Identifies New Cancer Genes and their Associated Molecular Mechanisms (CodeOcean, 2021).

Download references

Acknowledgements

We thank M. Vingron, R. Herwig and G. Barel for fruitful discussions, M. Vingron and C. Marr for proofreading the manuscript, and IMPRS for Computational Biology and Scientific Computing funding to R.S.-S. and S.B.

Author information

Affiliations

Authors

Contributions

R.S.-S. and A.M. conceived the idea of EMOGI. R.S.-S. designed and implemented the model and performed data analysis. S.B. helped to implement parts of the feature interpretation framework. A.M. supervised the study and provided resources. D.H. helped with the biological interpretation of the results and editing the manuscript. R.S.-S. and A.M. wrote the manuscript.

Corresponding author

Correspondence to Annalisa Marsico.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Machine Intelligence thanks Joel Nulsen, Kevin Y. Yip and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–22 and Methods.

Reporting Summary

Supplementary Table

Supplementary Tables 1–9.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Schulte-Sasse, R., Budach, S., Hnisz, D. et al. Integration of multiomics data with graph convolutional networks to identify new cancer genes and their associated molecular mechanisms. Nat Mach Intell 3, 513–526 (2021). https://doi.org/10.1038/s42256-021-00325-y

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing