Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Perspective
  • Published:

Multimodal data fusion for cancer biomarker discovery with deep learning

Abstract

Technological advances have made it possible to study a patient from multiple angles with high-dimensional, high-throughput multiscale biomedical data. In oncology, massive amounts of data are being generated, ranging from molecular, histopathology, radiology to clinical records. The introduction of deep learning has greatly advanced the analysis of biomedical data. However, most approaches focus on single data modalities, leading to slow progress in methods to integrate complementary data types. Development of effective multimodal fusion approaches is becoming increasingly important as a single modality might not be consistent and sufficient to capture the heterogeneity of complex diseases to tailor medical care and improve personalized medicine. Many initiatives now focus on integrating these disparate modalities to unravel the biological processes involved in multifactorial diseases such as cancer. However, many obstacles remain, including lack of usable data as well as methods for clinical validation and interpretation. Here, we cover these current challenges and reflect on opportunities through deep learning to tackle data sparsity and scarcity, multimodal interpretability and standardization of datasets.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Generation and processing of routinely collected biomedical modalities in oncology.
Fig. 2: Overview of different fusion strategies for multimodal data.
Fig. 3: Examples of model interpretability methods for histopathology and gene expression.

Similar content being viewed by others

References

  1. Topol, E. J. High-performance medicine: the convergence of human and artificial intelligence. Nat. Med. 25, 44–56 (2019).

    Article  Google Scholar 

  2. Riba, M., Sala, C., Toniolo, D. & Tonon, G. Big data in medicine, the present and hopefully the future. Front. Med. 6, 263 (2019).

    Article  Google Scholar 

  3. Hanahan, D. Hallmarks of cancer: new dimensions. Cancer Discov. 12, 31–46 (2022).

    Article  Google Scholar 

  4. Lu, J. et al. Multi-omics reveals clinically relevant proliferative drive associated with mTOR-MYC-OXPHOS activity in chronic lymphocytic leukemia. Nat. Cancer 2, 853–864 (2021).

    Article  Google Scholar 

  5. Medina-Martinez, J. S. et al. Isabl platform, a digital biobank for processing multimodal patient data. BMC Bioinformatics 21, 549 (2020).

    Article  Google Scholar 

  6. Chai, H. et al. Integrating multi-omics data through deep learning for accurate cancer prognosis prediction. Comput. Biol. Med. 134, 104481 (2021).

    Article  Google Scholar 

  7. Dietel, M. et al. Predictive molecular pathology and its role in targeted cancer therapy: a review focussing on clinical relevance. Cancer Gene Ther. 20, 211–221 (2013).

    Article  Google Scholar 

  8. Malone, E. R., Oliva, M., Sabatini, P. J. B., Stockley, T. L. & Siu, L. L. Molecular profiling for precision cancer therapies. Genome Med. 12, 8 (2020).

    Article  Google Scholar 

  9. Campbell, M. R. Update on molecular companion diagnostics—a future in personalized medicine beyond Sanger sequencing. Expert Rev. Mol. Diagn. 20, 637–644 (2020).

    Article  Google Scholar 

  10. Colomer, R. et al. When should we order a next generation sequencing test in a patient with cancer? EClinicalMedicine 25, 100487 (2020).

    Article  Google Scholar 

  11. van Dijk, E. L., Jaszczyszyn, Y., Naquin, D. & Thermes, C. The third revolution in sequencing technology. Trends Genet. 34, 666–681 (2018).

    Article  Google Scholar 

  12. Gorzynski, J. E. et al. Ultrarapid nanopore genome sequencing in a critical care setting. N. Engl. J. Med. 386, 700–702 (2022).

    Article  Google Scholar 

  13. Davidson, M. R., Gazdar, A. F. & Clarke, B. E. The pivotal role of pathology in the management of lung cancer. J Thorac. Dis. 5, S463–S478 (2013).

    Google Scholar 

  14. Pomerantz, B. J. Imaging and interventional radiology for cancer management. Surg. Clin. North Am. 100, 499–506 (2020).

    Article  Google Scholar 

  15. Yu, K. H. & Snyder, M. Omics profiling in precision oncology. Mol. Cell. Proteomics 15, 2525–2536 (2016).

    Article  Google Scholar 

  16. Rahman, A. et al. Advances in tissue-based imaging: impact on oncology research and clinical practice. Expert Rev. Mol. Diagn. 20, 1027–1037 (2020).

    Article  Google Scholar 

  17. van der Laak, J., Litjens, G. & Ciompi, F. Deep learning in histopathology: the path to the clinic. Nat. Med. 27, 775–784 (2021).

    Article  Google Scholar 

  18. Baxi, V., Edwards, R., Montalto, M. & Saha, S. Digital pathology and artificial intelligence in translational medicine and clinical practice. Mod. Pathol. 35, 23–32 (2022).

    Article  Google Scholar 

  19. Serag, A. et al. Translational AI and deep learning in diagnostic pathology. Front. Med. 6, 185 (2019).

    Article  Google Scholar 

  20. Iv, M. et al. MR imaging-based radiomic signatures of distinct molecular subgroups of medulloblastoma. Am. J. Neuroradiol. 40, 154–161 (2019).

    Article  Google Scholar 

  21. van Timmeren, J. E., Cester, D., Tanadini-Lang, S., Alkadhi, H. & Baessler, B. Radiomics in medical imaging—‘how-to’ guide and critical reflection. Insights Imaging 11, 91 (2020).

    Article  Google Scholar 

  22. Liang, J., Yang, C., Zeng, M. & Wang, X. TransConver: transformer and convolution parallel network for developing automatic brain tumor segmentation in MRI images. Quant. Imaging Med. Surg. 12, 2397–2415 (2022).

    Article  Google Scholar 

  23. Kim, M. et al. Deep learning in medical imaging. Neurospine 16, 657–668 (2019).

    Article  Google Scholar 

  24. Dosovitskiy, A. et al. An image is worth 16x16 words: transformers for image recognition at scale. Preprint at https://arxiv.org/abs/2010.11929 (2020).

  25. Gupta, R., Kurc, T., Sharma, A., Almeida, J. S. & Saltz, J. The emergence of pathomics. Curr. Pathobiol. Rep. 7, 73–84 (2019).

    Article  Google Scholar 

  26. Hosny, A. et al. Deep learning for lung cancer prognostication: a retrospective multi-cohort radiomics study. PLoS Med. 15, e1002711 (2018).

    Article  Google Scholar 

  27. Castro, D. C., Walker, I. & Glocker, B. Causality matters in medical imaging. Nat. Commun. 11, 3673 (2020).

    Article  Google Scholar 

  28. 21st Century Cures Act. H.R. 34 (114th Congress, 2016); https://www.congress.gov/114/bills/hr134/BILLS-114hr134enr.pdf

  29. Artificial intelligence and machine learning (AI/ML)-enabled medical devices. FDA (5 October 2022); https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-aiml-enabled-medical-devices

  30. Proposed Regulatory Framework for Modification to Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) (FDA, 2019); https://www.fda.gov/files/medical%20devices/published/US-FDA-Artificial-Intelligence-and-Machine-Learning-Discussion-Paper.pdf

  31. Kann, B. H., Thompson, R., Thomas, C. R. Jr., Dicker, A. & Aneja, S. Artificial intelligence in oncology: current applications and future directions. Oncology 33, 46–53 (2019).

    Google Scholar 

  32. Louis, D. N. et al. The 2016 World Health Organization classification of tumors of the central nervous system: a summary. Acta Neuropathol. 131, 803–820 (2016).

    Article  Google Scholar 

  33. Tateishi, K., Wakimoto, H. & Cahill, D. P. IDH1 mutation and World Health Organization 2016 diagnostic criteria for adult diffuse gliomas: advances in surgical strategy. Neurosurgery 64, 134–138 (2017).

    Article  Google Scholar 

  34. Capper, D. et al. DNA-methylation-based classification of central nervous system tumours. Nature 555, 469–474 (2018).

    Article  Google Scholar 

  35. Ceccarelli, M. et al. Molecular profiling reveals biologically discrete subsets and pathways of progression in diffuse glioma. Cell 164, 550–563 (2016).

    Article  Google Scholar 

  36. Prior, F. et al. The public cancer radiology imaging collections of The Cancer Imaging Archive. Sci. Data 4, 170124 (2017).

    Article  Google Scholar 

  37. Hutter, C. & Zenklusen, J. C. The Cancer Genome Atlas: creating lasting value beyond its data. Cell 173, 283–285 (2018).

    Article  Google Scholar 

  38. Jennings, C. N. et al. Bridging the gap with the UK Genomics Pathology Imaging Collection. Nat. Med. 28, 1107–1108 (2022).

    Article  Google Scholar 

  39. Mo, H., Breitling, R., Francavilla, C. & Schwartz, J. M. Data integration and mechanistic modelling for breast cancer biology: current state and future directions. Curr. Opin. Endocr. Metab. Res. 24, 100350 (2022).

    Article  Google Scholar 

  40. Nalejska, E., Maczynska, E. & Lewandowska, M. A. Prognostic and predictive biomarkers: tools in personalized oncology. Mol. Diagn. Ther. 18, 273–284 (2014).

    Article  Google Scholar 

  41. Grossman, J. E., Vasudevan, D., Joyce, C. E. & Hildago, M. Is PD-L1 a consistent biomarker for anti-PD-1 therapy? The model of balstilimab in a virally-driven tumor. Oncogene 40, 1393–1395 (2021).

    Article  Google Scholar 

  42. Davis, A. A. & Patel, V. G. The role of PD-L1 expression as a predictive biomarker: an analysis of all US Food and Drug Administration (FDA) approvals of immune checkpoint inhibitors. J. Immunother. Cancer 7, 278 (2019).

    Article  Google Scholar 

  43. van Elsas, M. J., van Hall, T. & van der Burg, S. H. Future challenges in cancer resistance to immunotherapy. Cancers 12, 935 (2020).

    Article  Google Scholar 

  44. Dzobo, K. Taking a full snapshot of cancer biology: deciphering the tumor microenvironment for effective cancer therapy in the oncology clinic. OMICS 24, 175–179 (2020).

    Article  Google Scholar 

  45. Ott, M., Prins, R. M. & Heimberger, A. B. The immune landscape of common CNS malignancies: implications for immunotherapy. Nat. Rev. Clin. Oncol. 18, 729–744 (2021).

    Article  Google Scholar 

  46. Bejarano, L., Jordao, M. J. C. & Joyce, J. A. Therapeutic targeting of the tumor microenvironment. Cancer Discov. 11, 933–959 (2021).

    Article  Google Scholar 

  47. Zomer, A., Croci, D., Kowal, J., van Gurp, L. & Joyce, J. A. Multimodal imaging of the dynamic brain tumor microenvironment during glioblastoma progression and in response to treatment. iScience 25, 104570 (2022).

    Article  Google Scholar 

  48. Cheerla, A. & Gevaert, O. Deep learning with multimodal representation for pancancer prognosis prediction. Bioinformatics 35, i446–i454 (2019).

    Article  Google Scholar 

  49. Grossman, R. L. et al. Toward a shared vision for cancer genomic data. N. Engl. J. Med. 375, 1109–1112 (2016).

    Article  Google Scholar 

  50. Hinkson, I. V. et al. A comprehensive infrastructure for big data in cancer research: accelerating cancer research and precision medicine. Front. Cell Dev. Biol. 5, 83 (2017).

    Article  Google Scholar 

  51. Putcha, G., Gutierrez, A. & Skates, S. Multicancer screening: one size does not fit all. JCO Precis. Oncol. 5, 574–576 (2021).

    Article  Google Scholar 

  52. Mi, H. et al. Digital pathology analysis quantifies spatial heterogeneity of CD3, CD4, CD8, CD20, and FoxP3 immune markers in triple-negative breast cancer. Front. Physiol. 11, 583333 (2020).

    Article  Google Scholar 

  53. Fass, L. Imaging and cancer: a review. Mol. Oncol. 2, 115–152 (2008).

    Article  Google Scholar 

  54. Lanckriet, G. R., De Bie, T., Cristianini, N., Jordan, M. I. & Noble, W. S. A statistical framework for genomic data fusion. Bioinformatics 20, 2626–2635 (2004).

    Article  Google Scholar 

  55. Gevaert, O., De Smet, F., Timmerman, D., Moreau, Y. & De Moor, B. Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks. Bioinformatics 22, e184–e190 (2006).

    Article  Google Scholar 

  56. Daemen, A. et al. A kernel-based integration of genome-wide data for clinical decision support. Genome Med. 1, 39 (2009).

    Article  MathSciNet  Google Scholar 

  57. Ritchie, M. D., Holzinger, E. R., Li, R., Pendergrass, S. A. & Kim, D. Methods of integrating data to uncover genotype–phenotype interactions. Nat. Rev. Genet. 16, 85–97 (2015).

    Article  Google Scholar 

  58. Panayides, A. S. et al. AI in medical imaging informatics: current challenges and future directions. IEEE J. Biomed. Health Inform. 24, 1837–1857 (2020).

    Article  Google Scholar 

  59. George, K., Faziludeen, S., Sankaran, P. & Joseph, K. P. Breast cancer detection from biopsy images using nucleus guided transfer learning and belief based fusion. Comput. Biol. Med. 124, 103954 (2020).

    Article  Google Scholar 

  60. Singh, S. P. et al. 3D deep learning on medical images: a review. Sensors 20, 5097 (2020).

    Article  Google Scholar 

  61. Sarvamangala, D. R. & Kulkarni, R.V. Convolutional neural networks in medical image understanding: a survey. Evol. Intell. 15, 1–22 (2021).

    Article  Google Scholar 

  62. Yuan, Q. et al. Performance of a machine learning algorithm using electronic health record data to identify and estimate survival in a longitudinal cohort of patients with lung cancer. JAMA Netw. Open 4, e2114723 (2021).

    Article  Google Scholar 

  63. Rasmy, L. et al. A study of generalizability of recurrent neural network-based predictive models for heart failure onset risk using a large and heterogeneous EHR data set. J. Biomed. Inform. 84, 11–16 (2018).

    Article  Google Scholar 

  64. Shickel, B., Tighe, P. J., Bihorac, A. & Rashidi, P. Deep EHR: a survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE J. Biomed. Health Inform. 22, 1589–1604 (2018).

    Article  Google Scholar 

  65. Ayala Solares, J. R. et al. Deep learning for electronic health records: a comparative review of multiple deep neural architectures. J. Biomed. Inform. 101, 103337 (2020).

    Article  Google Scholar 

  66. Hernandez-Boussard, T., Monda, K. L., Crespo, B. C. & Riskin, D. Real world evidence in cardiovascular medicine: ensuring data validity in electronic health record-based studies. J. Am. Med. Inform Assoc. 26, 1189–1194 (2019).

    Article  Google Scholar 

  67. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. & Dean, J. Distributed representations of words and phrases and their compositionality. In Proc. 26th International Conference on Neural Information Processing Systems 3111–3119 (Curran Associates, Inc., 2013).

  68. Pennington, J., Socher, R. & Manning, C. D. GloVe: global vectors for word representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) 14, 1532–1543 (2014).

  69. Peters, M. E. et al. Deep contextualized word representations. Preprint at http://arxiv.org/abs/1802.05365 (2018).

  70. Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 4171–4186 (Association for Computational Linguistics, 2019).

  71. Lee, J. et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 1234–1240 (2020).

    Article  Google Scholar 

  72. Huang, K., Garapati, S. & Rich, A. S. An interpretable end-to-end fine-tuning approach for long clinical text. Preprint at https://arxiv.org/abs/2011.06504 (2020).

  73. Vaswani, A. et al. Attention is all you need. In Proc. 31st International Conference on Neural Information Processing Systems 6000–6010 (Curran Associates, Inc., 2017).

  74. Jensen, P. B., Jensen, L. J. & Brunak, S. Mining electronic health records: towards better research applications and clinical care. Nat. Rev. Genet. 13, 395–405 (2012).

    Article  Google Scholar 

  75. Rasmy, L., Xiang, Y., Xie, Z., Tao, C. & Zhi, D. Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction. npj Digit. Med. 4, 86 (2021).

    Article  Google Scholar 

  76. Acosta, J. N., Falcone, G. J., Rajpurkar, P. & Topol, E. J. Multimodal biomedical AI. Nat. Med. 28, 1773–1784 (2022).

    Article  Google Scholar 

  77. Jain, M. S. et al. MultiMAP: dimensionality reduction and integration of multimodal data. Genome Biol. 22, 346 (2021).

    Article  Google Scholar 

  78. Lahnemann, D. et al. Eleven grand challenges in single-cell data science. Genome Biol. 21, 31 (2020).

    Article  Google Scholar 

  79. Baltrusaitis, T., Ahuja, C. & Morency, L. P. Multimodal machine learning: a survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41, 423–443 (2019).

    Article  Google Scholar 

  80. Yan, K. K., Zhao, H. & Pang, H. A comparison of graph- and kernel-based -omics data integration algorithms for classifying complex traits. BMC Bioinformatics 18, 539 (2017).

    Article  Google Scholar 

  81. Pavlidis, P., Weston, J., Cai, J. & Noble, W. S. Learning gene functional classifications from multiple data types. J. Comput. Biol. 9, 401–411 (2002).

    Article  Google Scholar 

  82. Serra, A., Galdi, P. & Tagliaferri, R. in Artificial Intelligence in the Age of Neural Networks and Brain Computing 265–280 (eds Kozma, R., Alippi, C., Choe, Y., & Morabito, F. C.) (Academic Press, 2019).

  83. Stahlschmidt, S. R., Ulfenborg, B. & Synnergren, J. Multimodal deep learning for biomedical data fusion: a review. Brief Bioinformatics 23, bbab569 (2022).

    Article  Google Scholar 

  84. Huang, S. C., Pareek, A., Seyyedi, S., Banerjee, I. & Lungren, M. P. Fusion of medical imaging and electronic health records using deep learning: a systematic review and implementation guidelines. npj Digit. Med. 3, 136 (2020).

    Article  Google Scholar 

  85. Picard, M., Scott-Boyer, M. P., Bodein, A., Perin, O. & Droit, A. Integration strategies of multi-omics data for machine learning analysis. Comput. Struct. Biotechnol. J. 19, 3735–3746 (2021).

    Article  Google Scholar 

  86. Chaudhary, K., Poirion, O. B., Lu, L. & Garmire, L. X. Deep learning-based multi-omics integration robustly predicts survival in liver cancer. Clin. Cancer Res. 24, 1248–1259 (2018).

    Article  Google Scholar 

  87. Huang, Z. et al. SALMON: Survival Analysis Learning With Multi-Omics Neural Networks on breast cancer. Front. Genet. 10, 166 (2019).

    Article  Google Scholar 

  88. Wang, T. et al. MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification. Nat. Commun. 12, 3445 (2021).

    Article  Google Scholar 

  89. Gevaert, O., Villalobos, V., Sikic, B. I. & Plevritis, S. K. Identification of ovarian cancer driver genes by using module network integration of multi-omics data. Interface Focus 3, 20130013 (2013).

    Article  Google Scholar 

  90. Xu, J. et al. A hierarchical integration deep flexible neural forest framework for cancer subtype classification by integrating multi-omics data. BMC Bioinformtics 20, 527 (2019).

    Article  Google Scholar 

  91. Zhang, L. et al. Deep learning-based multi-omics data integration reveals two prognostic subtypes in high-risk neuroblastoma. Front Genet 9, 477 (2018).

    Article  Google Scholar 

  92. Taskesen, E., Babaei, S., Reinders, M. M. & de Ridder, J. Integration of gene expression and DNA-methylation profiles improves molecular subtype classification in acute myeloid leukemia. BMC Bioinformatics 16, S5 (2015).

    Article  Google Scholar 

  93. Argelaguet, R. et al. Multi-omics factor analysis—a framework for unsupervised integration of multi-omics data sets. Mol. Syst. Biol. 14, e8124 (2018).

    Article  Google Scholar 

  94. Cancer Genome Atlas Research Network Comprehensive, integrative genomic analysis of diffuse lower-grade gliomas. N. Engl. J. Med. 372, 2481–2498 (2015).

    Article  Google Scholar 

  95. Cancer Genome Atlas Research Network Integrated genomic and molecular characterization of cervical cancer. Nature 543, 378–384 (2017).

    Article  Google Scholar 

  96. Cancer Genome Atlas Research Network Comprehensive and integrated genomic characterization of adult soft tissue sarcomas. Cell 171, 950–965 e928 (2017).

    Article  Google Scholar 

  97. Zhang, T., Zhang, L., Payne, P. R. O. & Li, F. Synergistic drug combination prediction by integrating multiomics data in deep learning models. Methods Mol. Biol. 2194, 223–238 (2021).

    Article  Google Scholar 

  98. Preuer, K. et al. DeepSynergy: predicting anti-cancer drug synergy with deep learning. Bioinformatics 34, 1538–1546 (2018).

    Article  Google Scholar 

  99. Sammut, S. J. et al. Multi-omic machine learning predictor of breast cancer therapy response. Nature 601, 623–629 (2022).

    Article  Google Scholar 

  100. Costello, J. C. et al. A community effort to assess and improve drug sensitivity prediction algorithms. Nat. Biotechnol. 32, 1202–1212 (2014).

    Article  Google Scholar 

  101. Duan, R. et al. Evaluation and comparison of multi-omics data integration methods for cancer subtyping. PLoS Comput. Biol. 17, e1009224 (2021).

    Article  Google Scholar 

  102. Venugopalan, J., Tong, L., Hassanzadeh, H. R. & Wang, M. D. Multimodal deep learning models for early detection of Alzheimer’s disease stage. Sci. Rep. 11, 3254 (2021).

    Article  Google Scholar 

  103. Mobadersany, P. et al. Predicting cancer outcomes from histology and genomics using convolutional networks. Proc. Natl Acad. Sci. USA 115, E2970–E2979 (2018).

    Article  Google Scholar 

  104. Cheng, J. et al. Integrative analysis of histopathological images and genomic data predicts clear cell renal cell carcinoma prognosis. Cancer Res. 77, e91–e100 (2017).

    Article  Google Scholar 

  105. Schulz, S. et al. Multimodal deep learning for prognosis prediction in renal cancer. Front. Oncol. 11, 788740 (2021).

    Article  Google Scholar 

  106. Zhan, Z. et al. Two-stage Cox-nnet: biologically interpretable neural-network model for prognosis prediction and its application in liver cancer survival using histopathology and transcriptomic data. NAR Genom. Bioinform. 3, lqab015 (2021).

    Article  Google Scholar 

  107. Chen, R. J. et al. Pathomic fusion: an integrated framework for fusing histopathology and genomic features for cancer diagnosis and prognosis. IEEE Trans. Med. Imaging 41, 757–770 (2022).

    Article  Google Scholar 

  108. Carrillo-Perez, F. et al. Machine-learning-based late fusion on multi-omics and multi-scale data for non-small-cell lung cancer diagnosis. J. Pers. Med. 12, 601 (2022).

    Article  Google Scholar 

  109. Rathore, S. et al. Radiomic MRI signature reveals three distinct subtypes of glioblastoma with different clinical and molecular characteristics, offering prognostic value beyond IDH1. Sci. Rep. 8, 5087 (2018).

    Article  Google Scholar 

  110. Mazzaschi, G. et al. Integrated MRI-immune-genomic features enclose a risk stratification model in patients affected by glioblastoma. Cancers 14, 3249 (2022).

    Article  Google Scholar 

  111. Wang, X. et al. Combining radiology and pathology for automatic glioma classification. Front. Bioeng. Biotechnol. 10, 841958 (2022).

    Article  Google Scholar 

  112. Yamaguchi, H. et al. Three-dimensional convolutional autoencoder extracts features of structural brain images with a ‘diagnostic label-free’ approach: application to schizophrenia datasets. Front. Neurosci. 15, 652987 (2021).

    Article  Google Scholar 

  113. Liu, Y. et al. Radiomic features are associated with EGFR mutation status in lung adenocarcinomas. Clin. Lung Cancer 17, 441–448 e446 (2016).

    Article  Google Scholar 

  114. Gevaert, O. et al. Predictive radiogenomics modeling of EGFR mutation status in lung cancer. Sci. Rep. 7, 41674 (2017).

    Article  Google Scholar 

  115. Nair, J. K. R. et al. Radiogenomic models using machine learning techniques to predict EGFR mutations in non-small cell lung cancer. Can. Assoc. Radiol. J. 72, 109–119 (2021).

    Article  Google Scholar 

  116. Pinker, K., Chin, J., Melsaether, A. N., Morris, E. A. & Moy, L. Precision medicine and radiogenomics in breast cancer: new approaches toward diagnosis and treatment. Radiology 287, 732–747 (2018).

    Article  Google Scholar 

  117. Itakura, H. et al. Magnetic resonance image features identify glioblastoma phenotypic subtypes with distinct molecular pathway activities. Sci. Transl. Med. 7, 303ra138 (2015).

    Article  Google Scholar 

  118. Yamamoto, S., Maki, D. D., Korn, R. L. & Kuo, M. D. Radiogenomic analysis of breast cancer using MRI: a preliminary study to define the landscape. Am. J. Roentgenol. 199, 654–663 (2012).

    Article  Google Scholar 

  119. Sutton, E. J. et al. Breast cancer subtype intertumor heterogeneity: MRI-based features predict results of a genomic assay. J. Magn. Reson. Imaging 42, 1398–1406 (2015).

    Article  Google Scholar 

  120. Li, H. et al. Quantitative MRI radiomics in the prediction of molecular classifications of breast cancer subtypes in the TCGA/TCIA data set. npj Breast Cancer 2, 16012 (2016).

    Article  Google Scholar 

  121. Li, J. et al. Imputation of missing values for electronic health record laboratory data. npj Digit. Med. 4, 147 (2021).

    Article  Google Scholar 

  122. Luo, Y. Evaluating the state of the art in missing data imputation for clinical data. Brief Bioinformatics 23, bbab489 (2022).

    Article  Google Scholar 

  123. Yoon, J., Zame, W. R. & van der Schaar, M. Estimating missing data in temporal data streams using multi-directional recurrent neural networks. IEEE Trans. Biomed. Eng. 66, 1477–1490 (2019).

    Article  Google Scholar 

  124. Zhou, T., Liu, M., Thung, K. H. & Shen, D. Latent representation learning for alzheimer’s disease diagnosis with incomplete multi-modality neuroimaging and genetic data. IEEE Trans. Med. Imaging 38, 2411–2422 (2019).

    Article  Google Scholar 

  125. Liu, Y. et al. Incomplete multi-modal representation learning for Alzheimer’s disease diagnosis. Med. Image Anal. 69, 101953 (2021).

    Article  Google Scholar 

  126. Ning, Z., Du, D., Tu, C., Feng, Q. & Zhang, Y. Relation-aware shared representation learning for cancer prognosis analysis with auxiliary clinical variables and incomplete multi-modality data. IEEE Trans. Med. Imaging 41, 186–198 (2022).

    Article  Google Scholar 

  127. Momeni, A., Thibault, M. & Gevaert, O. Dropout-enabled ensemble learning for multi-scale biomedical data. Preprint at bioRxiv https://www.biorxiv.org/content/early/2018/10/11/440362 (2018).

  128. Mehdipour Ghazi, M. et al. Training recurrent neural networks robust to incomplete data: application to Alzheimer’s disease progression modeling. Med. Image Anal. 53, 39–46 (2019).

    Article  Google Scholar 

  129. Ma, Q., Li, S. & Cottrell, G. W. Adversarial joint-learning recurrent neural network for incomplete time series classification. IEEE Trans. Pattern Anal. Mach. Intell. 44, 1765–1776 (2022).

    Article  Google Scholar 

  130. Sharrocks, K., Spicer, J., Camidge, D. R. & Papa, S. The impact of socioeconomic status on access to cancer clinical trials. Br. J. Cancer 111, 1684–1687 (2014).

    Article  Google Scholar 

  131. Niranjan, S. J. et al. Perceived institutional barriers among clinical and research professionals: minority participation in oncology clinical trials. JCO Oncol. Pract. 17, e666–e675 (2021).

    Article  Google Scholar 

  132. Mukherkjee, D., Saha, P., Kaplun, D., Sinitca, A. & Sarkar, R. Brain tumor image generation using an aggregation of GAN models with style transfer. Sci. Rep. 12, 9141 (2022).

    Article  Google Scholar 

  133. Qin, Z., Liu, Z., Zhu, P. & Xue, Y. A GAN-based image synthesis method for skin lesion classification. Comput. Methods Programs Biomed. 195, 105568 (2020).

    Article  Google Scholar 

  134. Huang, H. H., Rao, H., Miao, R. & Liang, Y. A novel meta-analysis based on data augmentation and elastic data shared lasso regularization for gene expression. BMC Bioinformatics 23, 353 (2022).

    Article  Google Scholar 

  135. Yufei, L. et al. Wasserstein GAN-based small-sample augmentation for new-generation artificial intelligence: a case study of cancer-staging data in biology. Engineering 5, 156–163 (2019).

    Article  Google Scholar 

  136. Wenqing, S., Tzu-Liang, T., Jianying, Z. & Wei, Q. Computerized breast cancer analysis system using three stage semi-supervised learning method. Comput. Methods Programs Biomed. 135, 77–88 (2016).

    Article  Google Scholar 

  137. Dwarikanath, M. Combining multiple expert annotations using semi-supervised learning and graph cuts for medical image segmentation. Comput. Vision Image Understanding 151, 114–123 (2016).

    Article  Google Scholar 

  138. Tran, Q. T., Alom, M. Z. & Orr, B. A. Comprehensive study of semi-supervised learning for DNA-methylation-based supervised classification of central nervous system tumors. BMC Bioinformatics 23, 223 (2022).

    Article  Google Scholar 

  139. Cheplygina, V., de Bruijne, M. & Pluim, J. P. W. Not-so-supervised: a survey of semi-supervised, multi-instance, and transfer learning in medical image analysis. Med. Image Anal. 54, 280–296 (2019).

    Article  Google Scholar 

  140. Jie, Y., Xutong, L. & Mingyue, Z. Current status of active learning for drug discovery. Artif. Intell. Life Sci. 1, 100023 (2021).

    Google Scholar 

  141. Min, W., Fan, M., Zhi-Heng, Z. & Yan-Xue, W. Active learning through density clustering. Expert Syst. Appl. 85, 305–317 (2017).

    Article  Google Scholar 

  142. Nahiyan, M. & Danilo, B. From YouTube to the brain: transfer learning can improve brain-imaging predictions with deep learning. Neural Netw. 153, 325–338 (2022).

    Article  Google Scholar 

  143. Park, Y., Hauschild, A. C. & Heider, D. Transfer learning compensates limited data, batch effects and technological heterogeneity in single-cell sequencing. NAR Genom. Bioinform. 3, lqab104 (2021).

    Article  Google Scholar 

  144. Novakovsky, G., Saraswat, M., Fornes, O., Mostafavi, S. & Wasserman, W. W. Biologically relevant transfer learning improves transcription factor binding prediction. Genome Biol. 22, 280 (2021).

    Article  Google Scholar 

  145. Ganoe, C. H. et al. Natural language processing for automated annotation of medication mentions in primary care visit conversations. JAMIA Open 4, ooab071 (2021).

    Article  Google Scholar 

  146. Krenzer, A. et al. Fast machine learning annotation in the medical domain: a semi-automated video annotation tool for gastroenterologists. Biomed. Eng. Online 21, 33 (2022).

    Article  Google Scholar 

  147. Lipkova, J. et al. Artificial intelligence for multimodal data integration in oncology. Cancer Cell 40, 1095–1110 (2022).

    Article  Google Scholar 

  148. Schaumberg, A. J. et al. Interpretable multimodal deep learning for real-time pan-tissue pan-disease pathology search on social media. Mod. Pathol. 33, 2169–2185 (2020).

    Article  Google Scholar 

  149. Begoli, E., Bhattacharya, T. & Kusnezov, D. The need for uncertainty quantification in machine-assisted medical decision making. Nat. Mach. Intell. 1, 20–23 (2019).

    Article  Google Scholar 

  150. Chen, R. J. et al. Pan-cancer integrative histology-genomic analysis via multimodal deep learning. Cancer Cell 40, 865–878.e6 (2022).

    Article  Google Scholar 

  151. Selvaraju, R. R. et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. Preprint at https://arxiv.org/abs/1610.02391 (2016).

  152. Lundberg, S. M. & Lee, S. I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 30, 4765–4774 (2017).

    Google Scholar 

  153. Dickinson, Q. & Meyer, J. G. Positional SHAP (PoSHAP) for interpretation of machine learning models trained from biological sequences. PLoS Comput. Biol. 18, e1009736 (2022).

    Article  Google Scholar 

  154. Steyaert, S. et al. Multimodal data fusion of adult and pediatric brain tumors with deep learning. Preprint at medRxiv https://doi.org/10.1101/2022.09.21.22280223 (2022).

  155. Simon, G. et al. Hover-Net: simultaneous segmentation and classification of nuclei in multi-tissue histology images. Med. Image Anal. 58, 101563 (2019).

    Article  Google Scholar 

  156. Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016).

    Article  Google Scholar 

  157. Mammoliti, A. et al. Orchestrating and sharing large multimodal data for transparent and reproducible research. Nat. Commun. 12, 5797 (2021).

    Article  Google Scholar 

  158. Mc Cord, K. A. et al. Current use and costs of electronic health records for clinical trial research: a descriptive study. CMAJ Open 7, E23–E32 (2019).

    Article  Google Scholar 

  159. Mc Cord, K. A. & Hemkens, L. G. Using electronic health records for clinical trials: where do we stand and where can we go? CMAJ 191, E128–E133 (2019).

    Article  Google Scholar 

  160. Makadia, R. & Ryan, P. B. Transforming the Premier Perspective Hospital Database into the Observational Medical Outcomes Partnership (OMOP) common data model. EGEMS 2, 1110 (2014).

    Article  Google Scholar 

  161. Papez, V. et al. Transforming and evaluating electronic health record disease phenotyping algorithms using the OMOP common data model: a case study in heart failure. JAMIA Open 4, ooab001 (2021).

    Article  Google Scholar 

  162. Liang, W. et al. Advances, challenges and opportunities in creating data for trustworthy AI. Nat. Mach. Intell. 4, 669–677 (2022).

    Article  Google Scholar 

  163. Costello, J. C. & Stolovitzky, G. Seeking the wisdom of crowds through challenge-based competitions in biomedical research. Clin. Pharmacol. Ther. 93, 396–398 (2013).

    Article  Google Scholar 

  164. Saez-Rodriguez, J. et al. Crowdsourcing biomedical research: leveraging communities as innovation engines. Nat. Rev. Genet. 17, 470–486 (2016).

    Article  Google Scholar 

  165. Khozin, S. et al. Real-world progression, treatment, and survival outcomes during rapid adoption of immunotherapy for advanced non-small cell lung cancer. Cancer 125, 4019–4032 (2019).

    Article  Google Scholar 

  166. Krzywinski, M. et al. Circos: an information aesthetic for comparative genomics. Genome Res. 19, 1639–1645 (2009).

    Article  Google Scholar 

Download references

Acknowledgements

We thank M. Humbert-Droz for discussions during the early stages of this Perspective. We are grateful for her insightful ideas and comments about these topics. We also express our great appreciation to C. Sadée and Y. Zheng for their valuable and constructive suggestions during the write-up of this Perspective.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Sandra Steyaert or Olivier Gevaert.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Peng Jiang, Krishna Bulusu, Po-Hsuan Chen and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Steyaert, S., Pizurica, M., Nagaraj, D. et al. Multimodal data fusion for cancer biomarker discovery with deep learning. Nat Mach Intell 5, 351–362 (2023). https://doi.org/10.1038/s42256-023-00633-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s42256-023-00633-5

This article is cited by

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer