Multimodal machine learning in precision health: A scoping review

Kline, Adrienne; Wang, Hanyin; Li, Yikuan; Dennis, Saya; Hutch, Meghan; Xu, Zhenxing; Wang, Fei; Cheng, Feixiong; Luo, Yuan

doi:10.1038/s41746-022-00712-8

Download PDF

Review Article
Open access
Published: 07 November 2022

Multimodal machine learning in precision health: A scoping review

Adrienne Kline¹,
Hanyin Wang¹,
Yikuan Li¹,
Saya Dennis¹,
Meghan Hutch¹,
Zhenxing Xu²,
Fei Wang ORCID: orcid.org/0000-0001-9459-9461²,
Feixiong Cheng³ &
…
Yuan Luo ORCID: orcid.org/0000-0003-0195-7456¹

npj Digital Medicine volume 5, Article number: 171 (2022) Cite this article

35k Accesses
81 Citations
46 Altmetric
Metrics details

Subjects

Abstract

Machine learning is frequently being leveraged to tackle problems in the health sector including utilization for clinical decision-support. Its use has historically been focused on single modal data. Attempts to improve prediction and mimic the multimodal nature of clinical expert decision-making has been met in the biomedical field of machine learning by fusing disparate data. This review was conducted to summarize the current studies in this field and identify topics ripe for future research. We conducted this review in accordance with the PRISMA extension for Scoping Reviews to characterize multi-modal data fusion in health. Search strings were established and used in databases: PubMed, Google Scholar, and IEEEXplore from 2011 to 2021. A final set of 128 articles were included in the analysis. The most common health areas utilizing multi-modal methods were neurology and oncology. Early fusion was the most common data merging strategy. Notably, there was an improvement in predictive performance when using data fusion. Lacking from the papers were clear clinical deployment strategies, FDA-approval, and analysis of how using multimodal approaches from diverse sub-populations may improve biases and healthcare disparities. These findings provide a summary on multimodal data fusion as applied to health diagnosis/prognosis problems. Few papers compared the outputs of a multimodal approach with a unimodal prediction. However, those that did achieved an average increase of 6.4% in predictive accuracy. Multi-modal machine learning, while more robust in its estimations over unimodal methods, has drawbacks in its scalability and the time-consuming nature of information concatenation.

Artificial intelligence-based methods for fusion of electronic health records and imaging data

Article Open access 26 October 2022

Integrated multimodal artificial intelligence framework for healthcare applications

Article Open access 20 September 2022

High-dimensional role of AI and machine learning in cancer research

Article 10 January 2022

Introduction

Clinical decision support has long been an aim for those implementing algorithms and machine learning in the health sphere^1,2,3. Examples of algorithmic decision supports utilize lab test values, imaging protocols or clinical (physical exam scores) hallmarks^4,5. Some health diagnoses can be made on a single lab value or a single threshold, such as in diabetes in older adults⁶. Other diagnoses are based on a constellation of the signs, symptoms, lab values and/or supportive imaging and are referred to as a clinical diagnosis. Oftentimes these clinical diagnoses are based on additive scoring systems that requires an admixture of positive and negative hallmarks prior to confirmatory labeling.

The modus operandi of a clinical diagnosis may fail to consider the relative weighting of these disparate data inputs and potentially non-linear relationships highlighting the limitations of human decision-making capacity. The strength of algorithmic decision-making support is that it can be used to offload such tasks, ideally yielding a more successful result. This is the promise of precision medicine. Precision medicine/health aims to create a medical model that customizes healthcare (decisions, treatments, practices etc.) that are tailored to either an individual or patient phenotype⁷. This includes tracking patients’ health trajectories longitudinally⁸, oftentimes incorporating genetics/epigenetics^9,10 and mathematical modeling¹¹ where diagnoses and treatments incorporate this unique information¹². Contrast this with a one-drug-fits-all model, where there is a single treatment per disorder. Figure 1 illustrates the flow of information from hospitals/care centers that generate disparate data. It is through computational modeling and information fusion that outcomes of interest such as drug and treatment targets ultimately facilitate better decision making at the patient level in those care centers. This phenomenon has sparked an interest in fusion studies using health care data.

**Fig. 1: Multimodal precision health; the flow of information.**

Undertakings to characterize this literature have been performed by Huang et al.¹³, who performed a systematic review of deep learning fusion of imaging and EHR data in health. However, it was limited to EHR and imaging data and deep learning applications. A follow-up review article included a commentary on omics and imaging data fusion¹⁴. The purpose of this study is to highlight the current scope of this research domain, summarize and offer suggestions to advance the field. The current study is more inclusive in the breadth of the types of machine learning protocols used and attempts to encompass all current modalities (information types/sources).

Data fusion is underpinned by information theory and is the mechanism by which disparate data sources are merged to create an information state based on the sources’ complementarity^15,16 (Box 1). The expectation in machine learning is that data fusion efforts will result in an improvement in predictive power^17,18 and therefore provide more reliable results in potentially low validity settings¹⁹. Data fusion touts the advantage that the results of modeling become inherently more robust by relying on a multitude of informational factors rather than a single type. However, the methodology of combinatory information has drawbacks; it adds complexity to specifying the model and reduces the interpretability of results^19,20.

Data from different sources and file formats are rarely uniform, and this is especially the case with clinical data²¹. For example, data sets can have different naming conventions, units of measure, or represent different local population biases. Care must be taken to search and correct for systematic differences between datasets and assess their degree of inter-operability. For example, Colubri et al. aggregated computed tomography (CT) and PCR lab values, by performing an intra-site normalization. This ensured that the values were comparable across sites. In doing so they discarded several potentially informative clinical variables since they were not all available in all datasets²².

A balance is required to allow information that is similar to work together (harmonization) and retain data purity (information correspondence)²³. Successful fusion uses data harmonization techniques that assure both in the quality control of the integration process. Clinical data harmonization requires multidisciplinary research among medicine, biology, and computer science. The clinical area of heart failure with preserved ejection fraction (HFpEF) saw novel applications of multiple tensor factorization formulations to integrate the deep phenotypic and trans-omic information²⁴, and this extends to other areas of precision medicine²⁵. To increase the portability of EHR-based phenotype algorithms, the Electronic Medical Records and Genomics (eMERGE) network has adopted common data models (CDMs) and standardized design patterns of the phenotype algorithm logic to integrate EHR data with genomic data and enable generalizability and scalability^26,27,28,29.

There are three main types of data fusion that are used in machine learning; early (data-level), intermediate (joint), and late (decision-level)³⁰. In the case of early fusion, multiple data sources are converted to the same information space. This often results in vectorization or numerical conversion from an alternative state, such as that performed by Chen et al. via vectorized pathology reports³¹. Medical images possess characteristics that can undergo numerical conversion based on area, volume, and/or structural calculations³². These are then concatenated with additional measurements from structured data sources and fed into an individual classifier. Canonical correlation analysis³³, non-negative matrix factorization^34,35, Independent Component Analysis (ICA) and numerical feature conversion methodologies exist as common options to transform all data into the same feature space³⁶.

Intermediate data fusion occurs as a stepwise set of models and offers the greatest latitude in model architecture. For example, a 3-stage deep neural learning and fusion model was proposed by Zhou et al.³⁷. Stage 1 consists of feature selection by a soft-max classifier for independent modalities. Stages 2 and 3 constitute combining these selected features, establishing a further refined set of features, and feeding these into a Cox-nnet to perform joint latent feature representation for Alzheimer’s diagnosis. In contrast to early fusion, intermediate fusion combines the features that distinguish each type of data to produce a new representation that is more expressive than the separate representations from which it arose.

In late fusion, typically multiple models are trained where each model corresponds to an incoming data source. This is akin to ensemble learning, which offers better performance over individual models³⁸. Ensemble methods use multiple learning algorithms (typically applied to the same dataset) to obtain better predictive performance than could be obtained from any of the constituent learning algorithm alone. However, multimodal machine learning ensemble here can refer to ensemble learning within a data type or across data types. These take symbolic representations as sources and combine them to obtain a more accurate decision³⁹. Bayesian’s methods are typically employed at this level⁴⁰ to support a voting process between the set of models into a global decision. Within late fusion there has been headway made to perform multitask deep learning^{41,42,43,44,45,46,47}. A schematic for the 3 subtypes of data fusion is presented in Fig. 2. Attributes in the fusion techniques are shown in Table 1.

**Fig. 2: Early, intermediate, and late fusion; flow ofinformation from information commons to model structure to outcomes.**

Table 1 Comparison of fusion techniques.

Full size table

Box 1 Terms and Concepts

Multimodal machine learning: the area of machine learning concerned with bringing together disparate data sources to capitalize on the unique and complementary information in an algorithmic framework.

Data harmonization: using machine learning to unify different data sources to improve its quality and utilization.

Multiview machine learning: another term for multimodal machine learning.

Data fusion: the specific methodologies undertaken to perform data integration for multimodal/multiview machine learning; they come in three broad categories: early, intermediate/joint, and late.

Results

Topic Modeling

The topic modeling displayed in Fig. 3 showcases the category, specific health ailment under investigation, and the modality type for the studies included. These were subsequently mapped to the category of the combination of information that were merged to create models for prediction/classification/clustering (Table 2). This plot should serve as a resource to fellow researchers to identify areas that are less frequent, such as dermatology⁴⁸, hematology⁴⁹, medication/drug issues such as alcohol use disorder that may offer new research horizons⁵⁰. Figure 4 identifies coding platforms, publishing trend and location over time, author locations and patient cohorts of the papers included in this review.

**Fig. 3: Topic and Modality Modeling.**

**Fig. 4: Meta-data from the review process.**

Table 2 Fusion and machine learning methods included in this review.

Full size table

Model validation, techniques, and modalities used

Of the models used in the papers, 126/128 explicitly reported performing a validation procedure of them. The most common validation processes performed were N-fold cross validation (55)^51,52, train test split (51), leave one out cross validation (10), and external dataset (10). A cornucopia of machine learning techniques and methods were used within and across articles in this review. They have been summarized in Table 2, noting in which fusion umbrella subtype they were implemented.

Early fusion

Most papers were published using early fusion. Of those, most were published using medical imaging and EHR data^{34,36,48,53,56,57,60,62,63,64,68,71,73,75,85,86,87,88,90,92,93,94,95,96,98,100,104,111}. Nearly all these papers performed numericalization of image features in essence converting them to structured data prior to processing, however, two performed matrix factorization^34,36. A combination of EHR and text data was noted in 15 papers^{31,54,69,72,79,80,81,91,99,102,106,112}. Meng et al. created a Bidirectional Representation Learning model used latent Dirichlet allocation (LDA) on clinical notes¹¹². Cohen et al. used unigrams and bigrams in conjunction with medication usage⁵⁴. Zeng et al. used concept identifiers from text as input features⁸¹. Nine papers used early fusion with imaging, EHR and genomic data^{32,50,51,55,61,65,83,89,108}. Doan et al. concatenated components derived from images with polygenic risk scores⁸³. Lin et al. also created aggregated scores from MRI, cerebral spinal fluid, and genetic information and brought them together into a single cohesive extreme learning machine to predict mild cognitive impairment⁵⁵. Tremblay et al. used a multivariate adaptive regression spline (MARS) after normalizing, removing highly correlated features⁸⁹. Ten papers performed fusion using imaging and genomic data^{33,52,70,76,77,78,82,84,97,110}. Three of these generated correlation matrices as features by vectorizing imaging parameters and correlating them with single nucleotide polymorphisms (SNPs) prior to feeding into the model^33,70,78. Three papers in this category used EHR and time series^58,74,101. Both Hernandez and Canniére et al. implemented their methods for purposes of cardiac rehabilitation and harnessed the power of support vector machines (SVMs). However, Hernandez preserved time series information by assembling ECG data into tensors that preserve the structural and temporal relationships inherent in the feature space⁷⁴, while Canniére performed dimensionality reduction of the time series information using t-SNE plots⁵⁸. Two papers comprised early fusion using imaging and time series^67,103. There were two papers that leveraged EHR and genomic information^66,119. Luo et al. implemented hybrid non-negative matrix factorization (HNMF) to find coherence between phenotypes and genotypes in those suffering from hypertension¹¹⁹. One paper leveraged early fusion using imaging and text data¹⁰⁵ and another used EHR, Genomics, Transcriptomics, and Insurance Claims¹⁵⁷.

Intermediate fusion

Intermediate fusion had the second highest number of papers published. 14 used imaging and EHR data^{43,59,113,114,118,121,123,125,126,129,131,132,133,135,137}. Zihni et al. merged the output from a Multilayer Perceptron (MLP) for modeling clinical data and convolutional neural network (CNN) for modeling imaging data into a single full connected final layer to predict stroke¹³⁵. A very similar approach was taken by Tang et al. who used three-dimensional CNNs and merged the layers in the last layer¹¹³. EHR and text data were fused together in 11 papers^{41,44,80,107,109,116,122,126,134,136,142}. Of these, six^{41,44,80,122,134,142} used long term short term (LSTM) networks, CNNs, or knowledge-guided CNNs¹⁶⁰ in their fusion of EHR and clinical notes. Chowdhury et al. used graph neural networks and autoencoders to learn meta-embeddings from structured lab test results and clinical notes^107,109. Pivovarov et al. learned probabilistic phenotypes from clinical notes and medication/lab orders (EHR) data¹³⁶. Two models each employing LDA where data type was treated as a bag of elements and to bring coherence between the two models to identify unique phenotypes. Ye et al. and Shin et al. used concept identifiers via NLP and bag-of-words techniques, respectively, prior to testing a multitude of secondary models^116,126. In general, clinical notes can provide complementary information to structured EHR data, where natural language processing (NLP) is often needed to extract such information^161,162,163. A few studies were published using imaging and genomic^37,117,120. Here radiogenomics were used to diagnose attention-deficit/hyperactivity disorder (ADHD), glioblastoma survival, and dementia respectively. Polygenic risk scores were combined with MRI by Yoo et al. who used an ensemble of random forests for ADHD diagnosis¹²⁰. Zhou et al. fused SNPs information together with MRI and positron emission tomography (PET) for dementia diagnosis by learning latent representations (i.e., high-level features) for each modality independently. Subsequently learning joint latent feature representations for each pair of modality combination and then learning the diagnostic labels by fusing the learned joint latent feature representations from the second stage was carried out³⁷. Wijethilake used MRI and gene expression profiling, performing recursive feature elimination prior to merging into multiple models SVM, linear regression, and artificial neural network (ANN). The linear regression model outperformed the other two merged models and any single modality¹¹⁷. Wang et al. and Zhang et al. showcased their work in merging imaging and text information^45,46. Both used LSTM for language modeling a CNN to generate embeddings that were joined together in a dual-attention model. This is achieved by computing a context vector with attended information preserved for each modality resulting in joint learning. Seldom were articles published using: Imaging/EHR/Text¹¹⁵, Genomic/Text⁴⁹, Imaging/Time series¹²⁷, Imaging/Text/Time series⁴⁷, Imaging/EHR/Genomic¹³⁰, Imaging/EHR/Time series¹²⁴, EHR/Genomic¹²⁸, EHR/Text/Time series⁴².

Late fusion

A much smaller number (n = 20) of papers used late fusion. Seven of those used imaging and EHR data types^{138,139,144,150,151,154,164}. Both Xiong et al. and Yin et al. fed outputs into a CNN to provide a final weighting and decision^150,151. Three papers were published using a trimodal approach: imaging, EHR and genomic^130,147,148. Xu et al. and Faris et al. published papers using EHR and text data^146,155. Faris et al. processed clinical notes using TF-IDF, hashing vectorizer and document embeddings in conjunction with binarized clinical data¹⁵⁵. Logistic Regression (LR), Random Forest (RF), Stochastic Gradient Descent Classifier (SGD Classifier), and a Multilayer Perceptron (MLP) were applied to both sets of data independently and final outputs of the two models were combined using different schemes: ranking, summation, and multiplication. Two articles were published using imaging and time series^149,152 both of which employed CNNs, one in video information of neonates¹⁴⁹ and the other in chest x-rays¹⁵². However, they differed in their processing of the time series data. Salekin used a bidirectional CNN and Nishimori used a one-dimensional CNN. Far fewer papers were published using Imaging/EHR/Text¹⁵³, EHR/Genomic/Text¹⁴⁵, imaging/EHR, time series/¹⁴¹, Imaging/Genomic¹⁵⁶, EHR/Genomic¹⁴⁰, and Imaging/Text³⁹.

Mixed fusion

Two papers performed multiple data fusion architectures^158,159. Huang et al. created seven different fusion architectures. These included, early, joint, and late fusion. The architecture that performed the best was the late elastic average fusion for the diagnosis of pulmonary embolism using computed tomography and EHR data¹⁵⁹. Their Late Elastic Average Fusion leveraged an ElasticNet (linear regression with combined L1 and L2 priors that act as regularizers) for EHR variables. El-Sappagh et al. performed early and late fusion to create an interpretable Alzheimer’s diagnosis and progression detection model¹⁵⁸. Their best performing model was one that implemented instance-based explanations of the random forest classifier by using the SHapley Additive exPlanations (SHAP) feature attribution. Despite using clinical, genomic, and imaging data, the most influential feature was found to be the Mini-Mental State Examination.

Clinical relevance

Data fusion may help address sex representation and increase population diversity issues (including minority populations) in health modeling by creating a more representative dataset if one datatype contained more of one sex and another datatype contained more of the other. This reciprocal compensation ability of employing various data sets would also hold true for racial or ethnic diversities.

Less than half (37.6%) of the papers were published in a journal intended for a clinical audience. None of the papers included in the final cohort of studies had created tools for clinical use that had FDA approval. Based on the rising number of papers in this field there is a growing and global need and interest to characterize these findings.

Discussion

Returning to our research questions, we outlined from the inception of this work, we arrive at Table 3.

Table 3 Research questions as outlined in Methods.

Full size table

Many issues were raised in the papers included in this review. The most common reported limitations were cohorts from a single site, small sample sizes, retrospective data, imbalanced samples, handling of missing data, feature engineering, controlling for confounding factors, and interpretation of the models employed. Samples were most often built from a single hospital or academic medical center¹⁴⁸. Small sample sizes often lead to poor model fitting and generalizability. The median number of unique patients reported across the studies was 658 with a standard deviation of 42,600. This suggests that while some studies were able to leverage large and multi-center cohorts, a great many were not able to do so^{70,82,120,131}.

Seldom were machine learning investigations on prospective data, an issue endemic in the field⁸⁴. Sample imbalances were often ignored, which results in biased models and misleading performance metrics^75,151. Missing data were usually ignored by dropping data or imputing, if not dealt with appropriately can skew the results^68,106,173. More studies need to discuss frequencies and types of missing data^{174,175,176,177}. Comparison of different imputation methods on the final results should be part of the reporting process¹⁷⁸. When performing statistical analysis, researchers usually ignored possible confounding factors such as age or gender. Doing so may have major effects on the impact of results¹⁵³. Such possible confounding effects should either be taken into consideration by the model^179,180 or adjusted for first, prior to reporting model results. Reasonable interpretations of the model and outputs must be presented so that clinicians find the results credible and then use them to provide guidance for treatments. However, most authors did not take the time to interpret the models for clinical audiences. Additionally, how the results may function as a clinical decision support tool. Different types of models warrant different explanations^129,130. These limitations are highlighted where they occur in the data processing and modeling building pipeline in Fig. 5.

**Fig. 5: Limitations to multimodal fusion in health and proposed future directions of the fields.**

To expedite and facilitate this field, we have outlined several gaps for future research in this field. These are listed in Fig. 5 and explored. Medication/drug topics present an underrepresented area, with only two papers being published in this field^50,66. Awareness of drug interaction effects is a difficult and growing issue^{181,182,183,184}, particularly in geriatrics, which gave rise to Beer’s criteria¹⁸⁵. Performing multimodal machine learning may offer an earlier detection of adverse events associated with medication misuse that is a result of iatrogenic error, non-compliance, or addiction. Similar justifications as outlined above could be applied to other areas seen as ‘under saturated’ such as hematology with only one paper⁴⁹ and nephrology having just three^41,87,99.

Augmenting clinical decision-making with ML to improve clinical research and outcomes offers positive impacts that have economic, ethical, and moral ramifications, as it can reduce suffering and save human lives. Multiple studies have now pointed out that if the data an ML model is trained on is biased this often yields bias in the predictions^186,187. Ensuring multisite, representative data will limit model biases. We also advocate for the creation of open access pipelines/libraries to speed up data conversion to make the technology more widely available^188,189. Improving accuracy at the expense of complex and time-consuming data transformations may mean the predictive power gained from a multimodal approach is offset by this front-end bottleneck, meaning predictions are no longer temporally relevant or useful.

While incorporating disparate data does lend itself to seemingly better predictions¹³⁹, as knowledge around certain diseases accumulates, data fusion in healthcare is an evolving target that warrants proactively adapting to the dynamic landscape¹⁹⁰. There is no single ML model with ubiquitous applicability. For example, it has been shown in protein-protein interactions that utilization of the XGBoost ensemble algorithm reduces noisy features, maintains the significant raw features, and prevents overfitting¹²². Similarly, LightGBM¹⁹¹ has the advantages of faster training speed, higher efficiency, lower memory usage, better accuracy¹⁹², and has been consistently outperforming other models^193,194. Graph neural networks can synthesize new connections leading to drug discovery/targets¹²².

In the same vein, models that permit interpretability should always be considered. For example, the Perotte et al.⁹⁹ model was not compared with conventional simpler machine learning classifiers, and collective matrix factorization becomes inherently difficult to interpret⁷⁹. Contrast this with the work of Fraccaro et al. whose study of macular degeneration noted their white box performed as well as black box methods implementions⁶⁸.

As this field and the datasets associated mature there is work needed to address the tenets of data management: Findability, Accessibility, Interoperability, and Reuse of digital datasets (FAIR)¹⁹⁵. This entails having metadata that are unique/de-identified and searchable, with open or federated access points (Findability/Accessibility), data that are shared broadly (Interoperable), and finally data that contain accurate and relevant attributes under a clear data usage agreement/license (Reusable). It is imperative there exist a clear definition of outcomes, assessment of biases and interpretability/transparency of results, and limitations inherent in its predictions¹⁹⁶.

Of crucial importance for uptake is that predictions be patient-specific and actionable at a granular level¹⁹⁷. For example, a 30-day readmission prediction algorithm¹⁰⁶, if implemented, may inform resource management and prompt additional research that may decrease the number of patients re-admitted. Linden et al. developed Deep personalized LOngitudinal convolutional RIsk model (DeepLORI) capable of creating predictions that can be interpreted on the level of individual patients¹²². Leveraging both and clinical and empirically driven information to create meaningful and usable recommendations¹³⁶ may improve clinician/end-user under understanding by relating to existing frameworks. Resources such as CRISP-ML provide a framework for moving use cases into more practical applications¹⁹⁸, while efforts to vie for Food and drug administration (FDA) approvals as a tool for use are encouraged to increased adoption.

Deployment of models with user interfaces annotating limitations inherent on those predictions¹⁹⁶ will allow clinical decision makers to interface and implement change accordingly. Follow-through on the aforementioned tasks will push individual fields to create recommendations for subsequent real-world implementations that are relevant, actionable, and transcend regional/subpopulation differences. Limitations of this scoping review include that it is not a systematic review. Therefore, it is possible that some titles that should have been included were missed. As the primary purpose of this study was to perform scientific paper profiling on multimodal machine learning in health, a critical appraisal of individual methodological quality of the included studies was not performed. However, commentary is provided on the methodological limitations that could have affected their results and impacted their claims. This review offers comprehensive meta-data and evaluation across health domains, immaterial to the type of machine learning or the data used. This work serves as both a summary and steppingstone for future research in this field. Data fusion in health is a growing field of global interest. The topic areas of health that have high frequency relative to others were neurology and cancer, which serve to highlight opportunities for further exploration in understudied topics (hematology, dermatology). Unimodal machine learning is inherently in contrast to current routine clinical practice in which imaging, clinical or genomic data are interpreted in unison to inform accurate diagnosis and warrants further work for ease of use and implementation. Overall, it appears justified to claim that multi-modal data fusion increases predictive performance over unimodal approaches (6.4% mean improvement in AUC) and is warranted where applicable. Multimodal machine learning may be a tool leveraged in precision medicine to further subgroup patients’ and their unique health fingerprint. Furthermore, as no papers in our review sought FDA approval, we advocate for more efforts into model translation and explore necessities that facilitate that end.

A dashboard resource published in conjunction with this review article is available at: https://multimodal-ml-health.herokuapp.com/. This dashboard was created as an interactive infographic-based display of the major findings presented in this paper. To foster future work, a drop-down menu was created to help researchers filter the underlying data file of titles based on the specific overarching health topic by selection. This will facilitate the location of relevant papers.

Methods

Search strategy and selection criteria

Inclusion requirements were: (a) original research article; (b) published within the last 10 years (encompassing years 2011–2021); (c) published in English; and (d) on the topic of multi-modal or multi-view using machine learning in health for diagnostic or prognostication applications.’Multi-modal’ or’multi-view’ for our context means the multiple data sources were not of the same type. For example, while a paper using CT and MRI may be considered multi-modal imaging; however, under our criteria it would be considered uni-modal (i.e., only included imaging). Exclusions for the purposes of this review were: (a) scientific articles not published in English; (b) commentaries or editorials; or (c) other review articles. Papers were also excluded if the data were not human-derived. We also excluded papers where the fusion already occurred at the data generation stage, such as spatial transcriptomics producing integrated tissue imaging and transcriptomics data^199,200,201. All papers underwent a 2-person verification for inclusion in the manuscript.

Search strings were established via literature searches and domain expertize. Additional keywords were identified based on key word co-occurrence matrices established from the abstracts of the previously included articles. Figure 6a displays the search strings, where an individual string would include one keyword from each column, this was performed for all combinations of search strings. An overview of the inclusion/exclusion process is noted in Fig. 6b and follows the standard set by PRISMA extension for scoping reviews²⁰².

**Fig. 6: Overview of our PRIMSA-SCR process.**

Data extracted

Information garnered from the articles included title, year published, FDA approval of the tool, whether published in a clinical journal, author affiliations, number of authors, locations (continents), and abstract. Health topic(s) addressed were extracted, as well as the broader medical topic(s) that encompass the disease. For example, lung cancer would be the specific disease in question. It arises from the topics of Cancer and Respiratory according to our classification. Health topic classification was overseen and reviewed by a medical doctor to ensure accuracy. As multiple health topics often encompassed a single health disease addressed in each paper, several papers are counted twice. This is true when being mapped from the right side of the Sankey plot to the specific health disease in the middle.

We recorded and extracted the number of different modalities and the divisions (i.e., text/image vs EHR/genomic/time series) used. The objective of each paper was extracted in a 1–2 sentence summary along with the keyword (if available). Patient characterization in the studies was performed by ascertaining the number of unique patients in the cohort and patient sex (i.e., Men/women/both or not mentioned).

Computational information extracted included: (a) the coding interface(s) used in data processing/analysis, (b) machine learning type, (c) data merging technique (early, intermediate, late), and (d) types of machine learning algorithms used. Whether validation was performed (yes/no), the statistical tests run, the nature of the validation, and outcomes measures were all recorded for each paper. The significance, impact, and limitations of each paper were extracted by reviewing the primary findings and limitations as noted in the papers.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

References

Nils, B. H. & Sabine, S. The ethics of machine learning-based clinical decision support: an analysis through the lens of professionalisation theory. BMC Med. Ethics 22, 112 (2021).
Sanchez-Pinto, L. N., Luo, Y. & Churpek, M. M. Big data and data science in critical care. Chest 154, 1239–1248 (2018).
Article PubMed PubMed Central Google Scholar
Miotto, R., Wang, F., Wang, S., Jiang, X. & Dudley, J. T. Deep learning for healthcare: Review, opportunities, and challenges. Brief Bioinform. 19, 1236–1246 (2017).
Timothy, J. W. D. et al. Machine learning of threedimensional right ventricular motion enables outcome prediction in pulmonary hypertension: A cardiac MR imaging study. Radiology 283, 381–390 (2017).
Article Google Scholar
Gigi, F. S., Gregory, R. H., Bradley, J. N. & Jun, D. Predicting breast cancer risk using personal health data and machine learning models. PLoS One 14, e0226765 (2019).
Michael, F., Justin, B. E.-T. & Elizabeth, S. Clinical and public health implications of 2019 endocrine society guidelines for diagnosis of diabetes in older adults. Diabetes Care 43, 1456–1461 (2020).
Article Google Scholar
Gambhir, S. S., Ge, T. J., Vermesh, O. & Spitler, R. Toward achieving precision health. Sci. Transl. Med. 10, eaao3612 (2018).
Article PubMed PubMed Central Google Scholar
Schüssler-Fiorenza Rose, S. M. et al. A longitudinal big data approach for precision health. Nat. Med. 25, 792–804 (2019).
Article PubMed Google Scholar
Feero, W. G. Introducing “genomics and precision health”. JAMA 317, 1842–1843 (2017).
Article PubMed Google Scholar
Kellogg, R. A., Dunn, J. & Snyder, M. P. Personal omics for precision health. Circulation Res. 122, 1169–1171 (2018).
Article CAS PubMed Google Scholar
Thapa, C. & Camtepe, S. Precision health data: Requirements, challenges and existing techniques for data security and privacy. Comput. Biol. Med. 129, 104130 (2021).
Article PubMed Google Scholar
Pranata, S. et al. Precision health care elements, definitions, and strategies for patients with diabetes: A literature review. Int. J. Environ. Res. Public Health 18, 6535 (2021).
Article PubMed PubMed Central Google Scholar
Shih Cheng, H., Anuj, P., Saeed, S., Imon, B. & Matthew, P. L. Fusion of medical imaging and electronic health records using deep learning: A systematic review and implementation guidelines. npj Digital Med. 3, 136 (2020).
Weixian, H., Kaiwen, T., Jinlong, H., Ziye, Z. & Shoubin, D. A review of fusion methods for omics and imaging data. In IEEE/ACM Trans Comput Biol Bioinform (IEEE, 2022).
Federico, C. A review of data fusion techniques. Scientific World J. 2013, 704504 (2013).
Alan, N. S., Christopher, L. B. & Franklin, E. W. Revisions to the JDL data fusion model. Sens. Fusion.: Architectures, Algorithms, Appl. III 3719, 430 (1999). publisher = SPIE.
Google Scholar
Erik, M.-M.-R., Antonio, A. A., Ramon, F. B. & Enrique, G.-C. Improved accuracy in predicting the best sensor fusion architecture for multiple domains. Sensors 21, 7007 (2021).
Ahmad, F. S., Luo, Y., Wehbe, R. M., Thomas, J. D. & Shah, S. J. Advances in machine learning approaches to heart failure with preserved ejection fraction. Heart Fail Clin. 18, 287–300 (2022).
Article PubMed Google Scholar
Erik, B. et al. Machine learning/artificial intelligence for sensor data fusion-opportunities and challenges. In IEEE Aerospace and Electronic Systems Magazines Vol. 36, 80–93 (IEEE, 2021).
Li, Y., Wu, X., Yang, P., Jiang, G. & Luo, Y. Machine learning applications in diagnosis, treatment, and prognosis of lung cancer. Preprint at https://arxiv.org/abs/2203.02794 (2022).
Kohane, I. S. et al. What every reader should know about studies using electronic health record data but may be afraid to ask. J. Med. Internet Res. 23, e22219 (2021).
Article PubMed PubMed Central Google Scholar
Andres, C. et al. Machine-learning Prognostic Models from the 2014–16 Ebola Outbreak: Data-harmonization challenges, validation strategies, and mHealth applications. EClinicalMedicine 11, 54–64 (2019).
Article Google Scholar
Afshin, J., Jean Pierre, P. & Johanne, M.-P. Machine-learning-based patient-specific prediction models for knee osteoarthritis. Nat. Rev. Rheumatol. 15, 49–60 (2019).
Article Google Scholar
Luo, Y., Ahmad, F. S. & Shah, S. J. Tensor factorization for precision medicine in heart failure with preserved ejection fraction. J. Cardiovasc. Transl. Res. 10, 305–312 (2017).
Luo, Y., Wang, F. & Szolovits, P. Tensor factorization toward precision medicine. Briefings Bioinform. 18, 511–514 (2016).
Rasmussen L. et al. Considerations for improving the portability of electronic health record-based phenotype algorithms. In Proceedings of 2019 AMIA Annual Symposium 2019 (2019).
Zhong, Y. et al. Characterizing design patterns of EHR-driven phenotype extraction algorithms. In 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 1143–1146 (IEEE, 2018).
Rasmussen, L. V. et al. Solutions for unexpected challenges encountered when integrating research genomics results into the EHR. ACI Open 4, e132–e5 (2020).
Article Google Scholar
Shang, N. et al. Making work visible for electronic phenotype implementation: Lessons learned from the eMERGE network. J. Biomed. Inf. 99, 103293 (2019).
Article Google Scholar
Dana, L., Tulay, A. & Christian, J. Multimodal data fusion: An overview of methods, challenges, and prospects. Proc. IEEE 103, 1449–1477 (2015).
Article Google Scholar
Wei, C., Yungui, H., Brendan, B. & Simon, L. The utility of including pathology reports in improving the computational identification of patients. J. Pathol. Inform. 7, 46 (2016).
Yubraj, G., Ramesh Kumar, L. & Goo Rak, K. Prediction and classification of Alzheimer’s disease based on combined features from apolipoprotein-E genotype, cerebrospinal fluid, MR, and FDG-PET imaging biomarkers. Front. Comput. Neurosci. 13, 72 (2019).
Xia An, B., Xi, H., Hao, W. & Yang, W. Multimodal data analysis of Alzheimer’s disease based on clustering evolutionary random forest. IEEE J. Biomed. Health Inform. 24, 2973–2983 (2020).
Article Google Scholar
Ariana, A. et al. Non-negative matrix factorization of multimodal MRI, fMRI and phenotypic data reveals differential changes in default mode subnetworks in ADHD. Neuroimage 102, 207–219 (2014).
Article Google Scholar
Chao, G., Luo, Y. & Ding, W. Recent advances in supervised dimension reduction: A survey. Mach. Learn. Knowl. Extraction 1, 341–358 (2019).
Article Google Scholar
Pillai, P. S. L. T. Y. Fusing heterogeneous data for Alzheimer’s disease classification. Stud. Health Technol. Inf. 216, 731–735 (2015).
Google Scholar
Tao, Z., Kim Han, T., Xiaofeng, Z. & Dinggang, S. Effective feature learning and fusion of multimodality data using stage-wise deep neural network for dementia diagnosis. Hum. Brain Mapp. 40, 1001–1016 (2019).
Article Google Scholar
Robi, P. Ensemble based systems in decision making. IEEE Circuits Syst. Mag. 6, 21–44 (2006).
Article Google Scholar
Francesco, C. & Marwa, M. Multimodal temporal machine learning for bipolar disorder and depression recognition. Pattern Analysis Appl. 25, 493–504 (2021).
Durrant-Whyte, H. F. Sensor models and multisensor integration. Int. J. Robot. Res. 7, 97–113 (1988).
Article Google Scholar
Xu, Z. et al. Identification of predictive sub-phenotypes of acute kidney injury using structured and unstructured electronic health record data with memory networks. J. Biomed. Inform. 102, 103361 (2019).
Article Google Scholar
Dongdong, Z., Changchang, Y., Jucheng, Z., Xiaohui, Y. & Ping, Z. Combining structured and unstructured data for predictive models: A deep learning approach. BMC Med. Inform. Decis. Mak. 20, 280 (2020).
Shaker, E.-S., Tamer, A., Islam, S. M. R. & Kyung Sup, K. Multimodal multitask deep learning model for Alzheimer’s disease progression detection based on time series data. Neurocomputing 412, 197–215 (2020).
Article Google Scholar
Haiyang, Y., Li, K. & Feng Qiang, X. Multimodal temporal-clinical note network for mortality prediction. J. Biomed. Semantics 12, 3 (2021).
Zizhao, Z., Pingjun, C., Manish, S. & Lin, Y. TandemNet: Distilling knowledge from medical images using diagnostic reports as optional semantic references. International Conference on Medical Image Computing and Computer-Assisted Intervention, 10435, 320–328 (Springer, Cham, 2017).
Xiaosong, W., Yifan, P., Le, L., Zhiyong, L. & Ronald, M. S. TieNet: Text-Image embedding network for common thorax disease classification and reporting in chest X-rays. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 9049–9058 (IEEE Computer Society, 2018).
Syed Arbaaz, Q., Sriparna, S., Mohammed, H., Gael, D. & Erik, C. Multitask representation learning for multimodal estimation of depression level. IEEE Intell. Syst. 34, 45–52 (2019).
Article Google Scholar
Jordan, Y., William, Y. & Philipp, T. Multimodal skin lesion classification using deep learning. Exp. Dermatol. 27, 1261–1267 (2018).
Article Google Scholar
Kai, Z et al. MLMDA: A machine learning approach to predict and validate MicroRNA-disease associations by integrating of heterogenous information sources. J. Transl. Med. 17, 260 (2019).
Sivan, K. et al. Predicting risk for Alcohol Use Disorder using longitudinal data with multimodal biomarkers and family history: A machine learning study. Mol. Psychiatry 26, 1133–1141 (2021).
Article Google Scholar
Mara Ten, K et al. MRI predictors of amyloid pathology: Results from the EMIF-AD Multimodal Biomarker Discovery study. Alzheimer’s Res. Ther. 10, 100 (2018).
Isamu, H. et al. Radiogenomics predicts the expression of microRNA-1246 in the serum of esophageal cancer patients. Sci. Rep. 10, 2532 (2020).
Jesus, J. C., Jianhua, Y. & Daniel, J. M. Enhancing image analytic tools by fusing quantitative physiological values with image features. J. Digit Imaging 25, 550–557 (2012).
Article Google Scholar
Kevin Bretonnel, C. et al. Methodological issues in predicting pediatric epilepsy surgery candidates through natural language processing and machine learning. Biomed. Inform. Insights 8, BII.S38308 (2016).
Article Google Scholar
Weiming, L. et al. Predicting Alzheimer’s disease conversion from mild cognitive impairment using an extreme learning machine-based grading method with multimodal data. Front. Aging Neurosci. 12, 77 (2020).
Micah, C. et al. Predicting rehospitalization within 2 years of initial patient admission for a major depressive episode: A multimodal machine learning approach. Transl. Psychiatry 9, 285 (2019).
Jongin, K. & Boreom, L. Identification of Alzheimer’s disease and mild cognitive impairment using multimodal sparse hierarchical extreme learning machine. Hum. Brain Mapp. 39, 3728–3741 (2018).
Article Google Scholar
Hélène De, C. et al. Wearable monitoring and interpretable machine learning can objectively track progression in patients during cardiac rehabilitation. Sensors (Switz.) 20, 1–15 (2020).
Google Scholar
Tamer, A., Shaker, E.-S. & Jose, M. A. Robust hybrid deep learning models for Alzheimer’s progression detection. Knowledge-Based Syst. 213, 106688 (2021).
Jeungchan, L. et al. Machine learning-based prediction of clinical pain using multimodal neuroimaging and autonomic metrics. Pain 160, 550–560 (2019).
Article Google Scholar
Uttam, K., Goo Rak, K. & Horacio, R.-G. An efficient combination among sMRI, CSF, cognitive score, and APOE ϵ 4 biomarkers for classification of AD and MCI using extreme learning machine. Comput. Intell. Neurosci. 2020, 8015156 (2020).
Bo, C., Mingxia, L., Heung, I. S., Dinggang, S. & Daoqiang, Z. Multimodal manifold-regularized transfer learning for MCI conversion prediction. Brain Imaging Behav. 9, 913–926 (2015).
Article Google Scholar
Kevin, H., Ulrike, L., Markus, M. & Katja, B-B. Separating generalized anxiety disorder from major depression using clinical, hormonal, and structural MRI data: A multimodal machine learning study. Brain Behavior 7, e00633 (2017).
Fayao, L., Luping, Z., Chunhua, S. & Jianping, Y. Multiple kernel learning in the primal for multimodal alzheimer’s disease classification. IEEE J. Biomed. Health Inform. 18, 984–990 (2014).
Article Google Scholar
Diego, C.-B. et al. Robust ensemble classification methodology for I123-Ioflupane SPECT images and multiple heterogeneous biomarkers in the diagnosis of Parkinson’s disease. Front. Neuroinform. 12, 53 (2018).
Yi, Z. et al. Predicting adverse drug reactions of combined medication from heterogeneous pharmacologic databases. BMC Bioinform. 19, 517 (2018).
Chin Po, C., Susan Shur Fen, G. & Chi Chun, L. Toward differential diagnosis of autism spectrum disorder using multimodal behavior descriptors and executive functions. Comput. Speech Lang. 56, 17–35 (2019).
Article Google Scholar
Paolo, F. et al. Combining macula clinical signs and patient characteristics for age-related macular degeneration diagnosis: A machine learning approach Retina. BMC Ophthalmol. 15, 10 (2015).
Benjamin, D. W. et al. Early identification of epilepsy surgery candidates: A multicenter, machine learning study. Acta Neurol. Scand. 144, 41–50 (2021).
Article Google Scholar
Xia An, B., Wenyan, Z., Lou, L. & Zhaoxu, X. Detecting risk gene and pathogenic brain region in EMCI using a novel GERF algorithm based on brain imaging and genetic data. IEEE J. Biomed. Health Inform. 25, 3019–3028 (2021).
Article Google Scholar
Prashanth, R., Sumantra Dutta, R., Pravat, K. M. & Shantanu, G. High-accuracy detection of early Parkinson’s disease through multimodal features and machine learning. Int. J. Med. Inf. 90, 13–21 (2016).
Article CAS Google Scholar
Ali, A.-M. et al. Machine learning for localizing epileptogenic-zone in the temporal lobe: Quantifying the value of multimodal clinical-semiology and imaging concordance. Front. Digital Health 3, 559103 (2021).
Baiying, L. et al. Assessment of liver fibrosis in chronic hepatitis B via multimodal data. Neurocomputing 253, 169–176 (2017).
Article Google Scholar
Larry, H. et al. Multimodal tensor-based method for integrative and continuous patient monitoring during postoperative cardiac care. Artif. Intell. Med. 113, 102032 (2021).
Ivo, D. D. et al. Predictive big data analytics: A study of Parkinson’s disease using large, complex, heterogeneous, incongruent, multi-source and incomplete observations. PLoS One 11, e0157077 (2016).
Eleftherios, T., Ioannis, S., Apostolos, H. K. & Kostas, M. Deep radiotranscriptomics of non-small cell lung carcinoma for assessing molecular and histology subtypes with a data-driven analysis. Diagnostics 11, 2383 (2021).
Hua, W. et al. Identifying disease sensitive and quantitative trait-relevant biomarkers from multidimensional heterogeneous imaging genetics data via sparse multimodal multitask learning. Bioinformatics 28, i127–36 (2012).
Xia, B., Xi, H., Yiming, X. & Hao, W. A novel CERNNE approach for predicting Parkinson’s Disease-associated genes and brain regions based on multimodal imaging genetics data. Med. Image Anal. 67, 101830 (2021).
Vijay, H. et al. Predicting complications in critical care using heterogeneous clinical data. IEEE Access 4, 7988–8001 (2016).
Article Google Scholar
Wang, H., Li, Y., Khan, S. A. & Luo, Y. Prediction of breast cancer distant recurrence using natural language processing and knowledge-guided convolutional neural network. Artif. Intell. Med. 110, 101977 (2020).
Article PubMed PubMed Central Google Scholar
Zeng, Z. et al. Identifying breast cancer distant recurrences from electronic health records using machine learning. J. Healthcare Inform. Res. 3, 283–299 (2019).
Kautzky, A. et al. Machine learning classification of ADHD and HC by multimodal serotonergic data. Transl. Psychiatry 10, 104 (2020).
Nhat Trung, D. et al. Distinct multivariate brain morphological patterns and their added predictive value with cognitive and polygenic risk scores in mental disorders. NeuroImage: Clin. 15, 719–731 (2017).
Article Google Scholar
Niha, B. et al. Radiogenomic analysis of hypoxia pathway is predictive of overall survival in Glioblastoma. Sci. Rep. 8, 7 (2018).
Jan, C. P. et al. Combining multimodal imaging and treatment features improves machine learning-based prognostic assessment in patients with glioblastoma multiforme. Cancer Med. 8, 128–136 (2019).
Article Google Scholar
Hao, Z. et al. Machine learning reveals multimodal MRI patterns predictive of isocitrate dehydrogenase and 1p/19q status in diffuse low- and high-grade gliomas. J. Neurooncol 142, 299–307 (2019).
Article Google Scholar
Namyong, P. et al. Predicting acute kidney injury in cancer patients using heterogeneous and irregular data. PLoS One 13, e0199839 (2018).
Wei Liang, T., Chee Kong, C., Sim Heng, O. & Alvin Choong Meng, N. Ensemble-based regression analysis of multimodal medical data for osteopenia diagnosis. Expert Syst. Appl. 40, 811–819 (2013).
Article Google Scholar
Sébastien, T., Yasser, I.-M., José María, M.-P., Alan, C. E. & Louis De, B. Defining a multimodal signature of remote sports concussions. Eur. J. Neurosci. 46, 1956–1967 (2017).
Article Google Scholar
Gianluca, B. et al. Multimodal predictive modeling of endovascular treatment outcome for acute ischemic stroke using machine-learning. Stroke 51, 3541–3551 (2020).
Yiming, X. et al. Explainable dynamic multimodal variational autoencoder for the prediction of patients with suspected central precocious puberty. IEEE J. Biomed. Health Inform. 26, 1362–1373 (2021).
Alan, D. K. et al. Mixture model framework for traumatic brain injury prognosis using heterogeneous clinical and outcome data. IEEE J. Biomed. Health Inform. 26, 1285–1296 (2021).
Casper, R. et al. Preoperative risk stratification in endometrial cancer (ENDORISK) by a Bayesian network model: A development and validation study. PLoS Med. 17, e1003111 (2020)
Tommaso, G. et al. SARS-COV-2 comorbidity network and outcome in hospitalized patients in Crema, Italy. PLoS One 16, e0248498 (2021).
Huan, Q. et al. Machine-learning radiomics to predict early recurrence in perihilar cholangiocarcinoma after curative resection. Liver Int. 41, 837–850 (2021).
Article Google Scholar
Ramon, C. et al. Alzheimer’s disease risk assessment using large-scale machine learning methods. PLoS One 8, e77949 (2013).
Aleksei, T. et al. Multimodal machine learning-based knee osteoarthritis progression prediction from plain radiographs and clinical data. Sci. Rep. 9, 20038 (2019).
Michael, J. D. et al. Development and validation of a novel automated Gleason grade and molecular profile that define a highly predictive prostate cancer progression algorithm-based test. Prostate Cancer Prostatic Dis. 21, 594–603 (2018).
Article Google Scholar
Perotte, A., Ranganath, R., Hirsch, J. S., Blei, D. & Elhadad, N. Risk prediction for chronic kidney disease progression using heterogeneous electronic health record data and time series analysis. J. Am. Med. Inf. Assoc. 22, 872–880 (2015).
Article Google Scholar
Lei, Y., Yalin, W., Paul, M. T., Vaibhav, A. N. & Jieping, Y. Multi-source feature learning for joint analysis of incomplete multiple heterogeneous neuroimaging data. Neuroimage 61, 622–632 (2012).
Article Google Scholar
Yanbo, X., Siddharth, B., Shriprasad, R. D., Kevin, O. M. & Jimeng, S. RAIM: Recurrent attentive and intensive model of multimodal patient monitoring data. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2565–2573 (Association for Computing Machinery, 2018).
Yixue, H., Mohd, U., Jun, Y., Hossain, M. S. & Ahmed, G. Recurrent convolutional neural network based multimodal disease risk prediction. Future Gener. Computer Syst. 92, 76–83 (2019).
Article Google Scholar
Shinichi, G., et al. Artificial intelligence-enabled fully automated detection of cardiac amyloidosis using electrocardiograms and echocardiograms. Nat. Commun. 12, 2726 (2021).
Rui, Y. et al. Richer fusion network for breast cancer classification based on multimodal data. BMC Med. Inform. Decis. Mak. 21, 134 (2021).
Li, Y., Wang, H. & Luo, Y. A comparison of pre-trained vision-and-language models for multimodal representation learning across medical images and reports. 2020 IEEE Int. Conf. Bioinform. Biomedicine (BIBM) 2020 1999–2004 (2020).
Sara Bersche, G. et al. A machine learning model to predict the risk of 30-day readmissions in patients with heart failure: A retrospective analysis of electronic medical records data. BMC Med. Inform. Decis. Mak. 18, 44 (2018).
Chowdhury, S., Zhang, C., Yu, P. S. & Luo, Y. Mixed pooling multi-view attention autoencoder for representation learning in healthcare. Preprint at https://arxiv.org/abs/1910.06456 (2019).
Ilan, S. et al. An unsupervised learning approach to identify novel signatures of health and disease from multimodal data. Genome Med. 12, 7 (2020).
Chowdhury, S., Zhang, C., Yu, P. S. & Luo, Y. Med2Meta: Learning representations of medical concepts with meta-embeddings. HEALTHINF 2020, 369–376 (2020).
Google Scholar
Subramanian V, Do MN, Syeda-Mahmood T. Multimodal fusion of imaging and genomics for lung cancer recurrence prediction, IEEE 17th International Symposium on Biomedical Imaging (ISBI), 804–808, (2020).
Michele, D. et al. Combining heterogeneous data sources for neuroimaging based diagnosis: Re-weighting and selecting what is important. Neuroimage 195, 215–231 (2019).
Article Google Scholar
Yiwen, M., William, S., Michael, K. O. & Corey, W. A. Bidirectional representation learning from transformers using multimodal electronic health record data to predict depression. IEEE J. Biomed. Health Inform. 25, 3121–3129 (2021).
Article Google Scholar
Xing, T. et al. Elaboration of a multimodal MRI-based radiomics signature for the preoperative prediction of the histological subtype in patients with non-small-cell lung cancer. BioMed. Engineering Online 19, 5 (2020).
Kathleen, C. F., Kristina Lundholm, F., Marie, E., Fredrik, Ö & Dimitrios, K. Predicting MCI status from multimodal language data using cascaded classifiers. Front. Aging Neurosci. 11, 205 (2019).
Amir Hossein, Y. et al. Multimodal mental health analysis in social media. PLoS One 15, e0226248 (2020).
Ye, J., Yao, L., Shen, J., Janarthanam, R. & Luo, Y. Predicting mortality in critically ill patients with diabetes using machine learning and clinical notes. BMC Med Inf. Decis. Mak. 20, 1–7 (2020).
CAS Google Scholar
Navodini, W., Mobarakol, I. & Hongliang, R. Radiogenomics model for overall survival prediction of glioblastoma. Med Biol. Eng. Comput 58, 1767–1777 (2020).
Article Google Scholar
Solale, T. et al. A distributed multitask multimodal approach for the prediction of Alzheimer’s disease in a longitudinal study. Neuroimage 206, 116317 (2020).
Luo, Y. et al. Integrating hypertension phenotype and genotype with hybrid non-negative matrix factorization. Bioinformatics 35, 1395–403 (2019).
Article CAS PubMed Google Scholar
Jae Hyun, Y., Johanna Inhyang, K., Bung Nyun, K. & Bumseok, J. Exploring characteristic features of attention-deficit/hyperactivity disorder: Findings from multi-modal MRI and candidate genetic data. Brain Imaging Behav. 14, 2132–2147 (2020).
Article Google Scholar
Chao, T., Baoyu, L., Jun, L. & Zhigao, Z. A Deep automated skeletal bone age assessment model with heterogeneous features learning. J. Med. Syst. 42, 249 (2018).
Cheng, C. et al. Improving protein–protein interactions prediction accuracy using XGBoost feature selection and stacked ensemble classifier. Comput. Biol. Med. 123 (2020).
Xueyi, Z. et al. Deep learning radiomics can predict axillary lymph node status in early-stage breast cancer. Nat. Commun. 11, 1236 (2020).
Juan Camilo, V.-C. et al. Multimodal assessment of Parkinson’s disease: A deep learning approach. IEEE J. Biomed. Health Inform. 23, 1618–1630 (2019).
Article Google Scholar
Ping, Z. et al. Deep-learning radiomics for discrimination conversion of Alzheimer’s disease in patients with mild cognitive impairment: A study based on 18F-FDG PET imaging. Front. Aging Neurosci. 13, (2021).
Shin, J., Li, Y. & Luo, Y. Early prediction of mortality in critical care setting in sepsis patients using structured features and unstructured clinical notes. In 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 2885–2890 (IEEE, 2021).
Jayachitra, V. P., Nivetha, S., Nivetha, R. & Harini, R. A cognitive IoT-based framework for effective diagnosis of COVID-19 using multimodal data. Biomed. Signal Processing Control 70, 102960 (2021).
Thomas, L. et al. An explainable multimodal neural network architecture for predicting epilepsy comorbidities based on administrative claims data. Front. Artificial Intelligence 4, 610197 (2021).
Alan Baronio, M., Carla Diniz Lopes, B. & Silvio Cesar, C. Computer-aided diagnosis of hepatocellular carcinoma fusing imaging and structured health data. Health Inform. Sci. Syst. 9, 20 (2021)
Janani, V., Li, T., Hamid Reza, H. & May, D. W. Multimodal deep learning models for early detection of Alzheimer’s disease stage. Sci. Rep. 11, 3254 (2021).
Cam Hao, H. et al. Bimodal learning via trilogy of skip-connection deep networks for diabetic retinopathy risk progression identification. Int. J. Med. Inform. 132, 103926 (2019).
Yucheng, T. et al. Prediction of type II diabetes onset with computed tomography and electronic medical records. Lect. Notes Comput. Sci. 12445, 13–23 (2020).
Article Google Scholar
Rui, Y. et al. Integration of multimodal data for breast cancer classification using a hybrid deep learning method. Lect. Notes Computer Sci. 11643, 460–469 (2019).
Article Google Scholar
Batuhan, B. & Mehmet, T. Improving clinical outcome predictions using convolution over medical entities with multimodal learning. Artif. Intell. Med. 117, 102112 (2021).
Esra, Z. et al. Multimodal fusion strategies for outcome prediction in Stroke. In HEALTHINF 2020 - 13th International Conference on Health Informatics, Proceedings; Part of 13th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2020 421–428 (2020).
Rimma, P. et al. Learning probabilistic phenotypes from heterogeneous EHR data. J. Biomed. Inform. 58, 156–165 (2015).
Article Google Scholar
Leon, M. A. et al. Modeling longitudinal imaging biomarkers with parametric Bayesian multi-task learning. Hum. Brain Mapp. 40, 3982–4000 (2019).
Article Google Scholar
Paris Alexandros, L. et al. Heterogeneity and classification of recent onset psychosis and depression: A multimodal machine learning approach. Schizophr. Bull. 47, 1130–1140 (2021).
Article Google Scholar
Nikolaos, K. et al. Prediction models of functional outcomes for individuals in the clinical high-risk state for psychosis or with recent-onset depression: A multimodal, multisite machine learning analysis. JAMA Psychiatry 75, 1156–1172 (2018).
Article Google Scholar
Dongdong, S., Minghui, W. & Ao, L. A multimodal deep neural network for human breast cancer prognosis prediction by integrating multi-dimensional data. IEEE/ACM Trans. Comput Biol. Bioinform. 16, 841–850 (2019).
Article Google Scholar
Karen, S. A. et al. A machine-learning framework for robust and reliable prediction of short- and long-term treatment response in initially antipsychotic-naïve schizophrenia patients based on multimodal neuropsychiatric data. Translational Psychiatry 10, 276 (2020).
Sun, M. et al. Early prediction of acute kidney injury in critical care setting using clinical notes and structured multivariate physiological measurements. Stud. Health Technol. Inf. 264, 368–372 (2019).
Google Scholar
Dennis, S. R., Simuni, T. & Luo, Y. A predictive model for Parkinson’s disease reveals candidate gene sets for progression subtype. 2020 IEEE Int. Conf. Bioinforma. Biomedicine (BIBM) 2020, 417–420 (2020).
Ming, X. et al. Accurately differentiating COVID-19, other viral infection, and healthy individuals using multimodal features via late fusion learning. J. Med. Internet Res. 23 (2021).
Min, C., Yixue, H., Kai, H., Lin, W. & Lu, W. Disease prediction by machine learning over big data from healthcare communities. IEEE Access 5, 8869–8879 (2017).
Article Google Scholar
Keyang, X. et al. Multimodal machine learning for automated ICD coding. Proc. Mach. Learn. Res. 106, 1–17 (2019).
Google Scholar
Liuqing, Y. et al. Deep learning based multimodal progression modeling for Alzheimer’s disease. Stat. Biopharma. Res. 13, 337–343 (2021).
Article Google Scholar
Peng, L. et al. A radiogenomics signature for predicting the clinical outcome of bladder urothelial carcinoma. Eur. Radio. 30, 547–557 (2020).
Article Google Scholar
Md Sirajus, S. et al. Multimodal spatio-temporal deep learning approach for neonatal postoperative pain assessment. Comput. Biol. Med. 129, 104150 (2021).
Jian, X. et al. Multimodal machine learning using visual fields and peripapillary circular OCT scans in detection of glaucomatous optic neuropathy. Ophthalmology 129, 171–180 (2021).
Google Scholar
Dai, Y., Yiqi, Z., Yang, W., Wenpu, Z. & Xiaoming, H. Auxiliary diagnosis of heterogeneous data of Parkinson’s disease based on improved convolution neural network. Multimed. Tools Appl. 79, 24199–24224 (2020).
Article Google Scholar
Makoto, N. et al. Accessory pathway analysis using a multimodal deep learning model. Sci. Rep. 11, 8045 (2021).
Wenhuan, Z., Anupam, G. & Daniel, H. H. On the application of advanced machine learning methods to analyze enhanced, multimodal data from persons infected with covid-19. Computation 9, 1–15 (2021).
Google Scholar
Jeremy, A. T., Kit, M. L. & Marta, I. G. Multi-dimensional predictions of psychotic symptoms via machine learning. Hum. Brain Mapp. 41, 5151–5163 (2020).
Article Google Scholar
Hossam, F., Maria, H., Mohammad, F., Haya, E. & Alaa, A. An intelligent multimodal medical diagnosis system based on patients’ medical questions and structured symptoms for telemedicine. Inform. Med. Unlocked 23, 100513 (2021).
Md Ashad, A. et al. A kernel machine method for detecting higher order interactions in multimodal datasets: Application to schizophrenia. J. Neurosci. Methods 309, 161–174 (2018).
Article Google Scholar
Luo, Y. et al. A multidimensional precision medicine approach identifies an autism subtype characterized by dyslipidemia. Nat. Med. 26, 1375–1379 (2020).
Article CAS PubMed Google Scholar
Shaker, E.-S., Jose, M. A., Islam, S. M. R., Ahmad, M. S., Kyung Sup, K. A multilayer multimodal detection and prediction model based on explainable artificial intelligence for Alzheimer’s disease. Sci. Rep. 11, 2660 (2021).
Shih Cheng, H., Anuj, P., Roham, Z., Imon, B. & Matthew, P. L. Multimodal fusion with deep neural networks for leveraging CT imaging and electronic health record: A case-study in pulmonary embolism detection. Sci. Rep. 10, 22147 (2020).
Yao, L., Mao, C. & Luo, Y. Clinical text classification with rule-based features and knowledge-guided convolutional neural networks. BMC Med. Inf. Decis. Mak. 19, 71 (2019).
Article CAS Google Scholar
Velupillai, S. et al. Using clinical Natural Language Processing for health outcomes research: Overview and actionable suggestions for future advances. J. Biomed. Inf. 88, 11–19 (2018).
Article Google Scholar
Luo, Y., Uzuner, Ö. & Szolovits, P. Bridging semantics and syntax with graph algorithms—state-of-the-art of extracting biomedical relations. Brief. Bioinform. 18, 160–178 (2016).
Article PubMed PubMed Central Google Scholar
Zeng, Z., Deng, Y., Li, X., Naumann, T. & Luo, Y. Natural language processing for EHR-based computational phenotyping. IEEE/ACM Trans. Comput Biol. Bioinform. 16, 139–153 (2018).
Article PubMed PubMed Central Google Scholar
Nikolaos, K. et al. Multimodal machine learning workflows for prediction of psychosis in patients with clinical high-risk syndromes and recent-onset depression. JAMA Psychiatry 78, 195–209 (2021).
Article Google Scholar
Petersen, R. C. et al. Alzheimer’s disease neuroimaging initiative (ADNI): Clinical characterization. Neurology 74, 201–209 (2010).
Article PubMed PubMed Central Google Scholar
Weinstein, J. N. et al. The cancer genome atlas pan-cancer analysis project. Nat. Genet. 45, 1113 (2013).
Article PubMed PubMed Central Google Scholar
Christopher, J. K., Alan, K., Mustafa, S., Greg, C. & Dominic, K. Key challenges for delivering clinical impact with artificial intelligence. BMC Med. 17, 195 (2019).
Michael, A. M. et al. Results of the 2016 International Skin Imaging Collaboration International Symposium on Biomedical Imaging challenge: Comparison of the accuracy of computer algorithms to dermatologists for the diagnosis of melanoma from dermoscopic images. J. Am. Acad. Dermatol. 78, 270–7.e1 (2018).
Article Google Scholar
Jannin, P. G. C. & Gibaud, B. Medical Applications of NDT Data Fusion (Springer, 2001).
Guang, Y., Qinghao, Y. & Jun, X. Unbox the black-box for the medical explainable AI via multi-modal and multi-centre data fusion: A mini-review, two showcases and beyond. Inf. Fusion 77, 29–52 (2022).
Article Google Scholar
Zhang, Z., & Sejdić, E., Radiological images and machine learning: trends, perspectives, and prospects. Comput. Biol. Med. 108, 354–370 (2019).
David, L., Enrico, C., Jessica, C., Parina, S. & Farah, M. How machine learning is embedded to support clinician decision making: An analysis of FDA-approved medical devices. BMJ Health Care Inform. 28, e100301 (2021).
Luo, Y., Szolovits, P., Dighe, A. S. & Baron, J. M. Using machine learning to predict laboratory test results. Am. J. Clin. Pathol. 145, 778–788 (2016).
Article PubMed Google Scholar
Thakur, S., Choudhary, J. & Singh, D. P. Systems 435–443 (Springer, 2021).
Luo, Y., Szolovits, P., Dighe, A. S. & Baron, J. M. 3D-MICE: Integration of cross-sectional and longitudinal imputation for multi-analyte longitudinal clinical data. J. Am. Med. Inform. Assoc. (JAMIA) 25, 645–653 (2017).
Article Google Scholar
Xue, Y., Klabjan, D. & Luo, Y. Mixture-based multiple imputation model for clinical data with a temporal dimension. 2019 IEEE International Conference on Big Data (Big Data) 2019 245–252 (IEEE, 2019).
Cao, W. et al. Brits: Bidirectional Recurrent Rmputation for Time Series. NeurIPS, 31 (2018).
Luo, Y. Evaluating the state of the art in missing data imputation for clinical data. Brief. Bioinform. 23, bbab489 (2022).
Article PubMed Google Scholar
Zhao, Q., Adeli, E. & Pohl, K. M. Training confounder-free deep learning models for medical applications. Nat. Commun. 11, 1–9 (2020).
Article Google Scholar
Luo, Y. & Mao, C. ScanMap: Supervised confounding aware non-negative matrix factorization for polygenic risk modeling. In Machine Learning for Healthcare Conference; 2020: PMLR 27–45 (2020).
Kalavathy, R. & Suresh, R. M. Pharmacovigilance from electronic medical records to report adverse events. J. Chem. Pharm. Sci. 2015, 188–191 (2015).
Google Scholar
Luo, Y. et al. Natural language processing for EHR-based pharmacovigilance: A structured review. Drug Saf. https://doi.org/10.1007/s40264-017-0558-6 (2017).
Segura Bedmar, I., Martínez, P. & Herrero Zazo, M. Semeval-2013 task 9: Extraction of drug–drug interactions from biomedical texts (ddiextraction 2013). 2013: Association for Computational Linguistics (2013).
Hammann, F. & Drewe, J. Data mining for potential adverse drug–drug interactions. Expert Opin. Drug Metab. Toxicol. 10, 665–671 (2014).
Article CAS PubMed Google Scholar
Donna, M. F. et al. American geriatrics society 2019 updated AGS beers criteria for potentially inappropriate medication use in older adults. J. Am. Geriatr. Soc. 67, 674–694 (2019).
Google Scholar
Obermeyer, Z., Powers, B., Vogeli, C. & Mullainathan, S. Dissecting racial bias in an algorithm used to manage the health of populations. Science 366, 447–53. (2019).
Article CAS PubMed Google Scholar
Wang, H. et al. Using machine learning to integrate sociobehavioral factors in predicting cardiovascular-related mortality risk. Stud. Health Technol. Inf. 264, 433–437 (2019).
Google Scholar
Christof, E. Open source software in industry. IEEE Softw. 25, 52–53 (2008).
Article Google Scholar
Robert, M. S. Why develop open-source software? The role of non-pecuniary benefits, monetary rewards, and open-source licence type. Oxf. Rev. Economic Policy 23, 605–619 (2007).
Article Google Scholar
Luo, Y., Wunderink, R. G. & Lloyd-Jones, D. Proactive vs reactive machine learning in health care: Lessons from the COVID-19 pandemic. JAMA 327, 623–624 (2022).
Article CAS PubMed Google Scholar
Ke, G. et al. LightGBM: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process Syst. 30 (2017).
Derara Duba, R., Taye Girma, D., Achim, I. & Worku Gachena, N. Diagnosis of diabetes mellitus using gradient boosting machine (Lightgbm). Diagnostics 11, 1714 (2021).
Xiaolei, S., Mingxi, L. & Zeqian, S. A novel cryptocurrency price trend forecasting model based on LightGBM. Financ. Res. Lett. 32, 101084 (2020).
Article Google Scholar
Dongzi, J., Yiqin, L., Jiancheng, Q., Zhe, C. & Zhongshu, M. SwiftIDS: Real-time intrusion detection system based on LightGBM and parallel intrusion detection mechanism. Comput. Secur. 97, 101984 (2020).
Article Google Scholar
Wilkinson, M. D. et al. The FAIR guiding principles for scientific data management and stewardship. Nature 3, 160018 (2016).
Google Scholar
Weissler, E. H., et al. The role of machine learning in clinical research: Transforming the future of evidence generation. Trials 22, 537 (2021).
Pratik, S. et al. Artificial intelligence and machine learning in clinical development: A translational perspective. npj Digital Med. 2, 69 (2019).
Inna, K. & Simeon, S. Interpretability of machine learning solutions in public healthcare: The CRISP-ML approach. Front. Big Data 4, 660206 (2021).
Marx, V. Method of the Year: Spatially resolved transcriptomics. Nat. Methods 18, 9–14 (2021).
Article CAS PubMed Google Scholar
Zeng, Z., Li, Y., Li, Y. & Luo, Y. Statistical and machine learning methods for spatially resolved transcriptomics data analysis. Genome Biol. 23, 1–23. (2022).
Article Google Scholar
Longo, S. K., Guo, M. G., Ji, A. L. & Khavari, P. A. Integrating single-cell and spatial transcriptomics to elucidate intercellular tissue dynamics. Nat. Rev. Genet. 22, 627–644 (2021).
Article CAS PubMed Google Scholar
PRISMA extension for scoping reviews (PRISMA-ScR): Checklist and explanation. Ann. Intern. Med. 169 467–473 (2018).

Download references

Acknowledgements

The study is supported in part by NIH Grants U01TR003528 and R01LM013337 to Y.L., and AG073323 to F.C.

Author information

Authors and Affiliations

Department of Preventive Medicine, Northwestern University, Chicago, 60201, IL, USA
Adrienne Kline, Hanyin Wang, Yikuan Li, Saya Dennis, Meghan Hutch & Yuan Luo
Department of Population Health Sciences, Cornell University, New York, 10065, NY, USA
Zhenxing Xu & Fei Wang
Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, 44195, OH, USA
Feixiong Cheng

Authors

Adrienne Kline
View author publications
You can also search for this author in PubMed Google Scholar
Hanyin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yikuan Li
View author publications
You can also search for this author in PubMed Google Scholar
Saya Dennis
View author publications
You can also search for this author in PubMed Google Scholar
Meghan Hutch
View author publications
You can also search for this author in PubMed Google Scholar
Zhenxing Xu
View author publications
You can also search for this author in PubMed Google Scholar
Fei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Feixiong Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Luo
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.K. led the review process, performed data extraction, performed the computation analysis, figure generation, writing, and dashboard creation. H.W., Y.L., S.D., and M.H. performed data extraction. Z.X. synthesized limitations of the studies. F.W. and F.C. performed proof reading and content curation. Y.L. conceived the review, oversaw the review process, and provided necessary feedback, proof reading, and content curation.

Corresponding author

Correspondence to Yuan Luo.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kline, A., Wang, H., Li, Y. et al. Multimodal machine learning in precision health: A scoping review. npj Digit. Med. 5, 171 (2022). https://doi.org/10.1038/s41746-022-00712-8

Download citation

Received: 20 June 2022
Accepted: 14 October 2022
Published: 07 November 2022
DOI: https://doi.org/10.1038/s41746-022-00712-8

This article is cited by

Generative AI in healthcare: an implementation science informed translational path on application, integration and governance
- Sandeep Reddy
Implementation Science (2024)
PheSeq, a Bayesian deep learning model to enhance and interpret the gene-disease association studies
- Xinzhi Yao
- Sizhuo Ouyang
- Jingbo Xia
Genome Medicine (2024)
Neural multi-task learning in drug design
- Stephan Allenspach
- Jan A. Hiss
- Gisbert Schneider
Nature Machine Intelligence (2024)
Recompensation in cirrhosis: unravelling the evolving natural history of nonalcoholic fatty liver disease
- Gong Feng
- Luca Valenti
- Ming-Hua Zheng
Nature Reviews Gastroenterology & Hepatology (2024)
Multimodal machine learning for modeling infant head circumference, mothers’ milk composition, and their shared environment
- Martin Becker
- Kelsey Fehr
- Meghan B. Azad
Scientific Reports (2024)

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Topic Modeling

Model validation, techniques, and modalities used

Early fusion

Intermediate fusion

Late fusion

Mixed fusion

Clinical relevance

Discussion

Methods

Search strategy and selection criteria

Data extracted

Reporting summary

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links