Integrating machine learning and multiscale modeling—perspectives, challenges, and opportunities in the biological, biomedical, and behavioral sciences

Fueled by breakthrough technology developments, the biological, biomedical, and behavioral sciences are now collecting more data than ever before. There is a critical need for time- and cost-efficient strategies to analyze and interpret these data to advance human health. The recent rise of machine learning as a powerful technique to integrate multimodality, multifidelity data, and reveal correlations between intertwined phenomena presents a special opportunity in this regard. However, machine learning alone ignores the fundamental laws of physics and can result in ill-posed problems or non-physical solutions. Multiscale modeling is a successful strategy to integrate multiscale, multiphysics data and uncover mechanisms that explain the emergence of function. However, multiscale modeling alone often fails to efficiently combine large datasets from different sources and different levels of resolution. Here we demonstrate that machine learning and multiscale modeling can naturally complement each other to create robust predictive models that integrate the underlying physics to manage ill-posed problems and explore massive design spaces. We review the current literature, highlight applications and opportunities, address open questions, and discuss potential challenges and limitations in four overarching topical areas: ordinary differential equations, partial differential equations, data-driven approaches, and theory-driven approaches. Towards these goals, we leverage expertise in applied mathematics, computer science, computational biology, biophysics, biomechanics, engineering mechanics, experimentation, and medicine. Our multidisciplinary perspective suggests that integrating machine learning and multiscale modeling can provide new insights into disease mechanisms, help identify new targets and treatment strategies, and inform decision making for the benefit of human health.


MOTIVATION
Wouldn't it be great to have a virtual replica of ourselves to explore our interaction with the real world in real time?A living, digital representation of ourselves that integrates machine learning and multiscale modeling to continuously learn and dynamically update itself as our environment changes in real life?A virtual mirror of ourselves that allows us to simulate our personal medical history and health condition using data-driven analytical algorithms and theory-driven physical knowledge?These are the objectives of the Digital Twin [Madni et al., 2019].In health care, a Digital Twin would allow us to improve health, sports, and education by integrating population data with personalized data, all adjusted in real time, on the basis of continuously recorded health and lifestyle parameters from various sources [Buynseels et al., 2018;Liu et al., 2019;Topol, 2019].But, realistically, how long will it take before we have a Digital Twin by our side?Can we leverage our knowledge of machine learning and multiscale modeling in the biological, biomedical, and behavioral sciences to accelerate developments towards a Digital Twin?Do we already have digital organ models that we could integrate into a full Digital Twin?And what are the challenges, open questions, opportunities, and limitations?Where do we even begin?Fortunately, we do not have to start entirely from scratch.Over the past two decades, multiscale modeling has emerged into a promising tool to build individual organ models by systematically integrating knowledge from the tissue, cellular, and molecular levels, in part fueled by initiatives like the United States Federal Interagency Modeling and Analysis Group IMAG.Depending on the scale of interest, multiscale modeling approaches fall into two categories, ordinary differential equation-based and partial differential equation-based approaches.Within both categories, we can distinguish data-driven and theory-driven machine learning approaches.Here we discuss these four approaches towards developing a Digital Twin.

Ordinary differential equations characterize the temporal evolution of biological systems.
Ordinary differential equations are widely used to simulate the integral response of a system during development, disease, environmental changes, or pharmaceutical interventions.Systems of ordinary differential equations allow us to explore the dynamic interplay of key characteristic features to understand the sequence of events, the progression of disease, or the timeline of treatment.Applications range from the molecular, cellular, tissue, and organ levels all the way to the population level including immunology to correlate protein-protein interactions and immune response [Rhodes et al., 2018], microbiology to correlate growth rates and bacterial competition, metabolic networks to correlate genome and physiome [Cuperlovic-Culf ,2018; Shaked et al. 2016], neuroscience to correlate protein misfolding to biomarkers of neurodegeneration [Weickenmeier et al., 2018], oncology to correlate perturbations to tumorigenesis [Nazari et al., 2018], and epidemiology to correlate disease spread to public health.In essence, ordinary differential equations are a powerful tool to study the dynamics of biological, biomedical, and behavioral systems in an integral sense, irrespective of the regional distribution of the underlying features.

Partial differential equations characterize the spatio-temporal evolution of biological systems.
In contrast to ordinary differential equations, partial differential equations are typically used to study spatial patterns of inherently heterogeneous, regionally varying fields, for example, the flow of blood through the cardiovascular system [Kissas et al., 2018] or the elastodynamic contraction of the heart [Baillargeon et al., 2014].Unavoidably, these equations are nonlinear and highly coupled, and we usually employ computational tools, for example, finite difference or finite element methods, to approximate their solution numerically.Finite element methods have a long history of success at combining ordinary differential equations and partial differential equations to pass knowledge across the scales [De et al., 2014].They are naturally tailored to represent the small-scale behavior locally through constitutive laws using ordinary differential equations and spatial derivatives and embed this knowledge globally into physics-based conservation laws using partial differential equations.Assuming we know the governing ordinary and partial differential equations, finite element models can predict the behavior of the system from given initial and boundary conditions measured at a few selected points.This approach is incredibly powerful, but requires that we actually know the physics of the system, for example through the underlying kinematic equations, the balance of mass, momentum, or energy.Yet, to close the system of equations, we need constitutive equations that characterize the behavior of the system, which we need to calibrate either with experimental data or with data generated via multiscale modeling.

Multiscale modeling seeks to predict the behavior of biological, biomedical, and behavioral systems.
Toward this goal, the main objective of multiscale modeling is to identify causality and establish causal relations between data.Our experience has taught us that most engineering materials display an elastic, viscoelastic, or elastoplastic constitutive behavior.However, biological and biomedical materials are often more complex, simply because they are alive [Ambrosi et al., 2011].They continuously interact with and adapt to their environment and dynamically respond to biological, chemical, or mechanical cues [Humphrey et al., 2014].Unlike classical engineering materials, living matter has amazing abilities to generate force, actively contract, rearrange its architecture, and grow or shrink in size [Goriely, 2017].To appropriately model these phenomena, we not only have to rethink the underlying kinetics, the balance of mass, and the laws of thermodynamics, but often have to include the biological, chemical, or electrical fields that act as stimuli of this living response [Lorenzo et al., 2016].This is where multiphysics multiscale modeling becomes important [Southern et al., 2008;Chabiniok et al., 2016]: Multiscale modeling allows us to thoroughly probe biologically relevant phenomena at a smaller scale and seamlessly embed the relevant mechanisms at the larger scale to predict the dynamics of the overall system [Hunt et al., 2018].Importantly, rather than making phenomenological assumptions about the behavior at the larger scale, multiscale models postulate that the behavior at the larger scale emerges naturally from the collective action at the smaller scales.Yet, this attention to detail comes at a price.While multiscale models can provide unprecedented insight to mechanistic detail, they are not only expensive, but also introduce a large number of unknowns, both in the form of unknown physics and unknown parameters [Raissini & Karniadakis 2018, Raissi et al., 2019].Fortunately, with the increasing ability to record and store information, we now have access to massive amounts of biological and biomedical data that allow us to systematically discover details about these unknowns.

Machine learning seeks to infer the dynamics of biological, biomedical, and behavioral systems.
Toward this goal, the main objective of machine learning is to identify correlations among big data.The focus in the biology, biomedicine, and behavioral sciences is currently shifting from solving forward problems based on sparse data towards solving inverse problems to explain large data sets [Raissi et al., 2018].Today, multiscale simulations in the biological, biomedical, and behavioral sciences seek to infer the behavior of the system, assuming that we have access to massive amounts of data, Figure 1.Machine learning and multiscale modeling in the biological, biomedical, and behavioral sciences.Machine learning and multiscale modeling interact on the parameter level via constraining parameter spaces, identifying parameter values, and analyzing sensitivity and on the system level via exploiting the underlying physics, constraining design spaces, and identifying system dynamics.Machine learning provides the appropriate tools towards supplementing training data, preventing overfitting, managing ill-posed problems, creating surrogate models, and quantifying uncertainty with the ultimate goal being to explore massive design spaces and identify correlations.Multiscale modeling integrates the underlying physics towards identifying relevant features, exploring their interaction, elucidating mechanisms, bridging scales, and understanding the emergence of function with the ultimate goal of predicting system dynamics and identifying causality.
while the governing equations and their parameters are not precisely known [Brunton et al., 2016, Raissi et al., 2017d, Wang et al., 2019].This is where machine learning becomes critical: Machine learning allows us to systematically preprocess massive amounts of data, integrate and analyze it from different input modalities and different levels of fidelity, identify correlations, and infer the dynamics of the overall system.Similarly, we can use machine learning to quantify the agreement of correlations, for example by comparing computationally simulated and experimentally measured features across multiple scales using Bayesian inference and uncertainty quantification [Sahli et al., 2019].

Machine learning and multiscale modeling mutually complement one another.
Where machine learning reveals correlation, multiscale modeling can probe whether the correlation is causal; where multiscale modeling identifies mechanisms, machine learning, coupled with Bayesian methods, can quantify uncertainty.This natural synergy presents exciting challenges and new opportunities in the biological, biomedical, and behavioral sciences [Lytton et al., 2017].On a more fundamental level, there is a pressing need to develop the appropriate theories to integrate machine learning and multiscale modeling.For example, it seems intuitive to a priori build physics-based knowledge in the form of partial differential equations, boundary conditions, and constraints into a machine learning approach [Raissi et al., 2019].Especially when the available data are limited, we can increase the robustness of machine learning by including physical constraints such as conservation, symmetry, or invariance.On a more translational level, there is a need to integrate data from different modalities to build predictive simulation tools of biological systems [Perdikaris & Karniadakis, 2016].For example, it seems reasonable to assume that experimental data from cell and tissue level experiments, animal models, and patient recordings are strongly correlated and obey similar physics-based laws, even if they do not originate from the same system.Naturally, while data and theory go hand in hand, some of the approaches to integrate information are more data driven, seeking to answer questions about the quality of the data, identify missing information, or supplement sparse training data [Tartakovsky et. al, 2018a[Tartakovsky et. al, , 2018b]], while some are more theory driven, seeking to answer questions about robustness and efficiency, analyze sensitivity, quantify uncertainty, and choose appropriate learning tools.
Figure 1 illustrates the integration of machine learning and multiscale modeling on the parameter level by constraining their spaces, identifying values, and analyzing their sensitivity, and on the system level by exploiting the underlying physics, constraining design spaces, and identifying system dynamics.Machine learning provides the appropriate tools for supplementing training data, preventing overfitting, managing ill-posed problems, creating surrogate models, and quantifying uncertainty.Multiscale modeling integrates the underlying physics for identifying relevant features, exploring their interaction, elucidating mechanisms, bridging scales, and understanding the emergence of function.We have structured this review around four distinct but overlapping methodological areas: ordinary and partial differential equations, and data and theory driven machine learning.These four themes roughly map into the four corners of the data-physics space, where the amount of available data increases from top to bottom and physical knowledge increases from left to right.For each area, we identify challenges, open questions, and opportunities, and highlight various examples from the life sciences.For convenience, we summarize the most important terms and technologies associated with machine learning with examples from multiscale modeling in Box 1.We envision that our article will spark discussion and inspire scientists in the fields of machine learning and multiscale modeling to join forces towards creating predictive tools to reliably and robustly predict biological, biomedical, and behavioral systems for the benefit of human health.

CHALLENGES
A major challenge in the biological, biomedical, and behavioral sciences is to understand systems for which the underlying data are incomplete and the physics are not yet fully understood.In other words, with a complete set of high-resolution data, we could apply machine learning to explore design spaces and identify correlations; with a validated and calibrated set of physics equations and material parameters, we could apply multiscale modeling to predict system dynamics and identify causality.By integrating machine learning and multiscale modeling we can leverage the potential of both, with the ultimate goal of providing quantitative predictive insight into biological systems.Figure 2 illustrates how we could integrate machine learning and multiscale modeling to better understand the cardiac system.
Active learning is a supervised learning approach in which the algorithm actively chooses the input training points.When applied to classification, it selects new inputs that lie near the classification boundary or minimize the variance.Example: Classification of arrhythmogenic risk [Sahli Costabal et al., 2019c].
Bayesian inference is a method of statistical inference that uses Bayes' theorem to update the probability of a hypothesis as more information becomes available.Examples: Selecting models and identifying parameters of liver [Madireddy et al., 2015], brain [Mihai et al., 2018], and cardiac tissue [Sahli Costabal et al., 2019].
Classification is a supervised learning approach in which the algorithm learns from a training set of correctly classified observations and uses this learning to classify new observations, where the output variable is discrete.Examples: Classifying the effects of individual single nucleotide polymorphisms on depression [Athreya et al., 2019]; of ion channel blockage on arrhythmogenic risk in drug development [Sahli Costabal et al., 2019c]; and of chemotherapeutic agents in personalized cancer medicine [Deist et al., 2019].
Clustering is an unsupervised learning method that organizes members of a dataset into groups that share common properties.Examples: Clustering the effects of simulated treatments [Lin et al., 2018;Neymotin et al., 2016].
Convolutional neural networks are neural network that apply the mathematical operation of convolution, rather than linear transformation, to generate the following output layer.Examples: Predicting mechanical properties using microscale volume elements through deep learning [Yang et al., 2018a], classifying red blood cells in sickle cell anemia [Xu et al., 2017], and inferring the solution of multi-scale partial differential equations [Zhu et al., 2019].
Deep neural networks or deep learning are a powerful form of machine learning that uses neural networks with a multiplicity of layers.Examples: biologially-inspired learning, where deep learning aims to replicate mechanisms of neuronal interactions in the brain [Marblestone et al., 2016], predicting the sequence specificities of DNA-and RNA-binding proteins [Alipanahi et. al., 2015].Domain randomization is a technique for randomizing the field of an image so that the true image is also recognized as a realization of this space.Example: Supplementing trianing data [Tremblay et al., 2018].
Dropout neural networks are a regularization method for neural networks that avoids overfitting by randomly deleting, or dropping, units along with their connections during training.Examples: Detecting retinal diseases and making diagnosis with various qualities of retinal image data [Rajan et al., 2018] Dynamic programming is a mathematical optimization formalism that enables the simplification of a complicated decision-making problem by recursively breaking it into simpler sub-problems.Example: de novo peptide sequencing via tandem mass spectrometry and dynamic programming [Chen et. al., 2001].Genetic programming is a heuristic search technique of evolving programs that starts from a population of random unfit programs and applies operations similar to natural genetic processes to identify a suitable program.Example: Predicting metabolic pathway dynamics from time-series multi-omics data [Costello & Martin, 2018].Physics-informed neural networks are neural networks that solve supervised learning tasks while respecting physical constraints.Examples: Diagnosing cardiovascular disorders non-invasively using four-dimensional magnetic resonance images of blood flow and arterial wall displacements [Kissas et al., 2019] and creating computationally efficient surrogates for velocity and pressure fields in intracranial aneurysms [Raissi et al., 2018].

Generative models
Recurrent neural networks are a class of neural networks that incorporate a notion of time by accounting not only for current data, but also for history with tunable extents of memory.Example: Identifying unknown constitutive relations in ordinary differential equation systems [Hagge et al., 2017] Reinforcement learning is a technique that circumvents the notions of supervised and unsupervised learning by exploring and combining decisions and actions in dynamic environments to maximize some notion of cumulative reward.Examples: Understanding common learning modes in biological, cognitive, and artificial systems through the lens of reinforcement learning [Botvinick et. al., 2019, Neftci et. al., 2019].
Regression is a supervised learning approach in which the algorithm learns from a training set of correctly identified observations and then uses this learning to evaluate new observations where the output variable is continuous.Example: Exploring the interplay between drug concentration and drug toxicity in cardiac elecrophysiology [Sahli Costabal et al., 2019b].
Supervised learning defines the task of learning a function that maps an input to an output based on example input-output pairs.Typical examples include classification and regression tasks.
System identification refers to a collection of techniques that identify the governing equations from data on a steady state or dynamical system.Examples: Inferring operators that form ordinary [Mangan et al., 2016] and partial differential equations [Wang et al., 2019].
Uncertainty quantification is the science of quantitative characterization and reduction of uncertainties that seeks to determine the likelihood of certain outputs if the inputs are not exactly known.Example: Quantifying the effects of experimental uncertainty in heart failure [Peirlinck et al., 2019] or the effects of estimated material properties on stress profiles in reconstructive surgery [Lee et al. 2018].

OPEN QUESTIONS AND OPPORTUNITIES
Numerous open questions and opportunities emerge from integrating machine learning and multiscale modeling in the biological, biomedical, and behavioral sciences.We address some of the most urgent ones below.

Managing ill-posed problems.
Can we solve ill-posed inverse problems that arise during parameter or system identification?Unfortunately, many of the inverse problems for biological systems are ill posed.Mathematically speaking, they constitute boundary value problems with unknown boundary values.Classical mathematical approaches are not suitable in these cases.Methods for backward uncertainty quantification could potentially deal with the uncertainty involved in inverse problems, but these methods are difficult to scale to realistic settings.In view of the high dimensional input space and the inherent uncertainty of biological systems, inverse problems will always be challenging.For example, it is difficult to determine if there are multiple solutions or no solutions at all, or to quantify the confidence in the prediction of an inverse problem with high-dimensional input data.Does the inherent regularization in the loss function of neural networks allow us to deal with ill-posed inverse partial differential equations without boundary or initial conditions and discover hidden states?Identifying missing information.Are the parameters of the proposed model sufficient to provide a basic set to produce higher-scale system dynamics?Multiscale simulations and generative networks can be set up to work in parallel, alongside the experiment, to provide an independent confirmation of parameter sensitivity.For example, circadian rhythm generators provide relatively simple dynamics but have very complex dependence on numerous underlying parameters, which multiscale modeling can reveal.An open opportunity exists to use generative models to identify both the underlying low dimensionality of the dynamics and the high dimensionality associated with parameter variation.Inadequate multiscale models could then be identified with failure of generative model predictions.
Creating surrogate models.Can we use generative adversarial networks to create new test data sets for multiscale models?Conversely, can we use multiscale modeling to provide training or test instances to create new surrogate models using deep learning?By using deep learning networks, we could provide answers more quickly than by using complex and sophisticated multiscale models.This could, for example, have significant applications in predicting pharmaceutical efficacy for patients with particular genetic inheritance in personalized medicine.
Discretizing space and time.Can we remove or automate the tyranny of grid generation in conventional methods?Discretization of complex and moving three-dimensional domains remains a time-and labor-intense challenge.It generally requires specific expertise and many hours of dedicated labor, and has to be re-done for every individual model.This becomes particularly relevant when creating personalized models with complex geometries at multiple spatial and temporal scales.While many efforts in machine learning are devoted to solving partial differential equations in a given domain, new opportunities arise for machine learning when dealing directly with the creation of the discrete problem.This includes automatic mesh generation, meshless interpolation, and parameterization of the domain itself as one of the inputs for the machine learning algorithm.Neural networks constrained by physics remove the notion of a mesh, but retain the more fundamental concept of basis functions: They impose the conservation laws of mass, momentum, and energy at, e.g., collocation points that, while neither connected through a regular lattice nor through an unstructured grid, serve to determine the parameters that define the basis functions.
Bridging the scales.Can machine learning provide scale bridging in cases where a relatively clean separation of scales is possible?For example, in cancer, machine learning could be used to explore responses of both immune cells and tumor cells based on single-cell data.This example points towards opportunities to build a multiscale model on the families of solutions to codify the evolution of the tumor at the organ or metastasis scales.

Supplementing training data.
Can we use simulated data to supplement training data?Supervised learning, as used in deep networks, is a powerful technique, but requires large amounts of training data.Recent studies have shown that, in the area of object detection in image analysis, simulation augmented by domain randomization can be used successfully as a supplement to existing training data.In areas where multiscale models are well-developed, simulation across vast areas of parameter can, for example, supplement existing training data for nonlinear diffusion models to provide physics-informed machine learning.Similarly, multiscale models can be used in biological, biomedical, and behavioral systems to augment insufficient experimental or clinical data sets.
Quantifying uncertainty.Can theory-driven machine learning approaches enable the reliable characterization of predictive uncertainty and pinpoint its sources?Uncertainty quantification is the backbone of decision-making.This has many practical applications such as decision-making in the clinic, the robust design of synthetic biology pathways, drug target identification and drug risk assessment.There are also opportunities to use quantification to guide the informed, targeted acquisition of new data.
Exploring massive design spaces.Can theory-driven machine learning approaches uncover meaningful and compact representations for complex inter-connected processes, and, subsequently, enable the cost-effective exploration of vast combinatorial spaces?While this is already pretty common in the design of bio-molecules with target properties in drug development, there many other applications in biology and biomedicine that could benefit from these technologies.
Elucidating mechanisms.Can theory-driven machine learning approaches enable the discovery of interpretable models that can not only explain data, but also elucidate mechanisms, distill causality, and help us probe interventions and counterfactuals in complex multiscale systems?For instance, causal inference generally uses various statistical measures such as partial correlation to infer causal influence.If instead, the appropriate statistical measure were known from the underlying physics, would the causal inference be more accurate or interpretable as a mechanism?
Understanding emergence of function.Can theory-driven machine learning, combined with sparse and indirect measurements, produce a mechanistic understanding of the emergence of biological function?Understanding the emergence of function is of critical importance in biology and medicine, environmental studies, biotechnology, and other biological sciences.The study of emergence critically relies on our ability to model collective action on a lower scale to predict how the phenomena on the higher scale emerges from this collective action.
Harnessing biologically-inspired learning.Can we harness biological learning to design more efficient algorithms and architectures?Artificial intelligence through deep learning is an exciting recent development that has seen remarkable success in solving problems, which are difficult for humans.Typical examples include chess and Go, as well as the classical problem of image recognition, that, although superficially easy, engages broad areas of the brain.By contrast, activities that neuronal networks are particularly good at remain beyond the reach of these techniques, for example, the control systems of a mosquito engaged in evasion and targeting are remarkable considering the small neuronal network involved.This limitation provides opportunities for more detailed brain models to assist in developing new architectures and new learning algorithms.
Preventing overfitting.Can we use prior physics based knowledge to avoid overfitting or non-physical predictions?How can we calibrate and validate the proposed models without overfitting?How can we apply cross-validation to simulated data, especially when the simulations may contain long-time correlations?From a conceptual point of view, this is a problem of supplementing the set of known physics-based equations with constitutive equations, an approach, which has long been used in traditional engineering disciplines.While data-driven methods can provide solutions that are not constrained by preconceived notions or models, their predictions should not violate the fundamental laws of physics.Sometimes it is difficult to determine whether the model predictions obey these fundamental laws, especially when the functional form of the model cannot be determined explicitly.This makes it difficult to know whether the analysis predicts the correct answer for the right reasons.There are well-known examples of deep learning neural networks that appear to be highly accurate, but make highly inaccurate predictions when faced with data outside their training regime, and others that make highly inaccurate predictions based on seemingly minor changes to the target data.To address this limitation, there are numerous opportunities to combine machine learning and multiscale modeling towards a priori satisfying the fundamental laws of physics, and, at the same time, preventing overfitting of the data.

Minimizing data bias.
Can an arrhythmia patient trust a neural net controller embedded in a pacemaker that was trained under different environmental conditions than the ones during his own use?Training data come at various scales and different levels of fidelity.Data are typically generated by existing models, experimental assays, historical data, and other surveys, all of which come with their own inductive biases.Machine learning algorithms can only be as good as the data they have seen.This implies that proper care needs to be taken to safe-guard against biased data-sets.New theory-driven approaches could provide a rigorous foundation to estimate the range of validity, quantify the uncertainty, and characterize the level of confidence of machine learning based approaches.
Increasing rigor and reproducibility.Can we establish rigorous validation tests and guidelines to thoroughly test the predictive power of models built with machine learning algorithms?The use of open source codes and data sharing by the machine learning community is a positive step, but more benchmarks and guidelines are needed for neural networks constrained by physics.Reproducibility has to be quantified in terms of statistical metrics, as many optimization methods are stochastic in nature and may lead to different results.In addition to memory, the 32-bit limitation of current GPU systems is particularly troubling for modeling biological systems where steep gradients and very fast multirate dynamics may require 64-bit arithmetic, which, in turn, may require ten times more computational time with the current technologies.

CONCLUSIONS
Machine learning and multiscale modeling naturally complement and mutually benefit from one another.Machine learning can explore massive design spaces to identify correlations and multiscale modeling can predict system dynamics to identify causality.Recent trends suggest that integrating machine learning and multiscale modeling could become key to better understand biological, biomedical, and behavioral systems.Along those lines, we have identified five major challenges in moving the field forward.
The first challenge is to create robust predictive mechanistic models when dealing with sparse data.The lack of sufficient data is a common problem in modeling biological, biomedical, and behavioral systems.For example, it can result from an inadequate experimental resolution or an incomplete medical history.A critical first step is to systematically identify the missing information.Experimentally, this can guide the judicious acquisition of new data or even the design of new experiments to complement the knowledge base.Computationally, this can motivate supplementing the available training data by performing computational simulations.Ultimately, the challenge is to maximize information gain and optimize efficiency by combining low-and high-resolution data and integrating data from different sources, which, in machine learning terms, introduces a multifidelity, multimodality approach.
The second challenge is to manage ill-posed problems.Unfortunately, ill-posed problems are relatively common in the biological, biomedical, and behavioral sciences and can result from inverse modeling, for example, when identifying parameter values or identifying system dynamics.A potential solution is to combine deterministic and stochastic models.Coupling the deterministic equations of classical physics-the balance of mass, momentum, and energy-with the stochastic equations of living systems-cell-signaling networks or reaction-diffusion equations-could help guide the design of computational models for problems that are otherwise ill-posed.Along those lines, physics-informed neural networks and physics-informed deep learning are promising approaches that inherently use constrained parameter spaces and constrained design spaces to manage ill-posed problems.Beyond improving and combining existing techniques, we could even think of developing entirely novel architectures and new algorithms to understand ill-posed biological problems inspired by biological learning.
The third challenge is to efficiently explore massive design spaces to identify correlations.With the rapid developments in gene sequencing and wearable electronics, the personalized biomedical data has become as accessible and inexpensive as never before.However, efficiently analyzing big data sets within massive design spaces remains a logistic and computational challenge.Multiscale modeling allows us to integrate physics-based knowledge to bridge the scales and efficiently pass information across temporal and spatial scales.Machine learning can utilize these insights for efficient model reduction towards creating surrogate models that drastically reduce the underlying parameter space.Ultimately, the efficient analytics of big data, ideally in real time, is a challenging step towards bringing artificial intelligence solutions into the clinic.
The fourth challenge is to robustly predict system dynamics to identify causality.Indeed, this is the actual driving force behind integrating machine learning and multiscale modeling for biological, biomedical, and behavioral systems.Can we eventually utilize our models to identify relevant biological features and explore their interaction in real time?A very practical example of immediate translational value is whether we can identify disease progression biomarkers and elucidate mechanisms from massive data sets, for example, early biomarkers of neurodegenerative disease, by exploiting the fundamental laws of physics.On a more abstract level, the ultimate challenge is to advance data-and theory-driven approaches to create a mechanistic understanding of the emergence of biological function to explain phenomena at higher scale as a result of the collective action on lower scales.
The fifth challenge is to know the limitations of machine learning and multiscale modeling.Important steps in this direction are analyzing sensitivity and quantifying of uncertainty.While machine learning tools are increasingly used to perform sensitivity analysis and uncertainty quantification for biological systems, they are at a high risk of overfitting and generating non-physical predictions.Ultimately, our approaches can only be as good as the underlying models and the data they have been trained on, and we have to be aware of model limitations and data bias.Preventing overfitting, minimizing data bias, and increasing rigor and reproducibility have been and will always remain the major challenges in creating predictive models for biological, biomedical, and behavioral systems.
are generic population-based optimization algorithms that adopt mechanisms inspired by biological evolution including reproduction, mutation, recombination, and selection to characterize biological systems.Example: Automatic parameter tuning in multiscale brain modeling [Dura-Bernal et al., 2017].Gaussian process regression is a nonparametric, Bayesian approach to regression to create surrogate models and quantify uncertainty.Examples: Creating surrogate models to characterize the effects of drugs on features of the electrocardiogram [Sahli Costabal et al., 2019a] or of material properties on the stress profiles from reconstructive surgery [Lee et al., 2019a].
are statistical models that aim to capture the joint distribution between a set of observed or latent random variables.Example: Using deep generative models for chemical space exploration and matter engineering [Sanchez-Lengeling et.al., 2019].Multi-fidelity modeling is a supervised learning approach to synergistically combine abundant, inexpensive, low fidelity and sparse, expensive, high fidelity data from experiments and simulations to create efficient and robust surrogate models.Examples: Simulating the mixed convection flow past a cylinder[Perdikaris et al., 2016] and cardiac electrophysiology[Sahli Costabal et al., 2019b]

Figure 3 .
Figure 3. Partial differential equations encode physics-based knowledge into machine learning.Physics imposed on neural networks.The neural network on the left, as yet unconstrained by physics, represents the solution u(x,t) of the partial differential equation; the neural network on the right describes the residual f(x,t) of the partial differential equation.The example illustrates a one-dimensional version of the Schrödinger equation with unknown parameters λ1 and λ 2 to be learned.In addition to unknown parameters, we can learn missing functional terms in the partial differential equation.Currently, this optimization is done empirically based on trial and error by a human-in-the-loop.Here, the u-architecture is a fully-connected neural network, while the farchitecture is dictated by the partial differential equation and is, in general, not possible to visualize explicitly.Its depth is proportional to the highest derivative in the partial differential equation times the depth of the uninformed u neural network.

1
Figure 3. Partial differential equations encode physics-based knowledge into machine learning.Physics imposed on neural networks.The neural network on the left, as yet unconstrained by physics, represents the solution u(x,t) of the partial differential equation; the neural network on the right describes the residual f(x,t) of the partial differential equation.The example illustrates a one-dimensional version of the Schrödinger equation with unknown parameters λ1 and λ 2 to be learned.In addition to unknown parameters, we can learn missing functional terms in the partial differential equation.Currently, this optimization is done empirically based on trial and error by a human-in-the-loop.Here, the u-architecture is a fully-connected neural network, while the farchitecture is dictated by the partial differential equation and is, in general, not possible to visualize explicitly.Its depth is proportional to the highest derivative in the partial differential equation times the depth of the uninformed u neural network.