## Introduction

Advances in patient care have led to the availability of large amounts of data, generated by typical examinations, such as blood sample analysis, clinical imaging (e.g., CT, MRI), and biopsy sampling, as well as by innovative ‘-omics’ sequencing techniques1,2. Such clinical data are the cornerstone in the practice of personalized medicine and specifically in the field of oncology3,4. However, this abundance of information comes with multiple issues related to data exploitation and synthesis towards the prediction of pathology dynamics. In particular, we identify the following two major challenges: (C1) First, knowledge of the regulatory mechanisms underlying clinical data is largely lacking, and (C2) second, patient data collection is usually sparse in time, since patient clinical visits/examinations are a limiting factor.

Regarding the challenge (C1), scientists have been long supported by the use of mathematical modeling as a tool to identify causal relationships in the experimental and clinical data, particularly in cancer treatment 5,6,7. Mathematical models allow to propose and test biological hypotheses, analyze the sensitivity of observables with respect to biological parameters, and provide insights into the mechanistic details governing the phenomenon of interest8,9,10. Although these models can be extremely powerful both in predicting system responses and suggesting new experimental directions, they require adequate knowledge of the underlying biological mechanisms of the analyzed system. Typically, this knowledge is not complete, and only for a limited portion of the involved variables the corresponding mechanistic interactions are sufficiently known. Therefore, even though mathematical models provide a good description of a simplified version of the associated system dynamics, they do not always allow for accurate and quantitative predictions.

On the other hand, machine learning techniques are suitable to deal with the inherent complexity of biomedical problems, but without caring for the knowledge of the underlying interactions11. While mathematical models rely on causality, statistical learning methods identify correlations among data12. This approach allows to systemically process large amounts of data and infer hidden patterns in biological systems. As a consequence, machine learning-based techniques can provide valuable predictive accuracy upon sufficient training, but do not typically allow for any mechanistic insight into the investigated problem13. The overall understanding of the fundamental system dynamics becomes almost impossible, as the chance to generalize the ‘learnt’ system behavior. The latter issue is further exacerbated by the (C2) challenge that has to be faced, related to the sparseness of clinical data. In particular for a single patient, such information is only available at a few time-points, corresponding to clinical presentation. To face the two mentioned challenges with the final aim of improving personalized predictions, we propose a novel—to the best of our knowledge—Bayesian method that combines mathematical modeling and statistical learning (BaM3). As a proof-of-concept, the proposed method is tested on a synthetic dataset of brain tumor growth. We analyze the performance of the new approach in predicting two relevant clinical outcomes, namely tumor burden and infiltration. When comparing predictions from the mechanistic model with those from the BaM3 method, we obtain improved predictions for the vast majority of virtual patients. We also apply the approach to a clinical dataset of patients suffering from chronic lymphocytic leukemia (CLL). The BaM3 method shows excellent agreement between the predicted clinical output and the reported data. Finally, as an additional test case, we show how the proposed methodology can be used to assess the time-to-relapse (TtR) in a dataset of ovarian cancer patients.

## Methods

### Formal definition of BaM3

We start by assuming a random variable (r.v.) triplet (Y, Xm, Xu) that denotes the system’s modelable Xm, unmodelable Xu variables/data (e.g., patient’s age or sex, results of different ‘-omics’ techniques, etc.) and the associated observed clinical outputs Y. We then introduce t0 as the clinical presentation time of a patient at which the patient-specific r.v. realizations $$({{{{{{{{\bf{X}}}}}}}}}_{m}={{{{{{{{\bf{x}}}}}}}}}_{m}^{* },{{{{{{{{\bf{X}}}}}}}}}_{u}={{{{{{{{\bf{x}}}}}}}}}_{u}^{* })$$ are obtained. The overall goal of the method is to predict the patient’s clinical outputs by an estimate $${{{{{{{\bf{Y}}}}}}}}=\hat{{{{{{{{\bf{y}}}}}}}}}$$ at a certain prediction time tp. The true clinical outputs of the patient will be denoted as y. Moreover, we consider the existence of an N-patient ensemble dataset (y, xm, xu). In this dataset all the variables (i.e., modelables, unmodelables, and clinical outputs) are recorded at the time of diagnosis td, which might differ from one patient to another. Both t0 and td are calculated from the onset of the disease. We introduce two distinct times to account for the variability of the disease stage among different patients (td) and the time at which a specific patient is presented to the clinic (t0) (see the corresponding Fig. S1).

The core idea of the method is to consider the predictions of the mathematical model $$p({{{{{{{\bf{Y}}}}}}}}=\hat{{{{{{{{\bf{y}}}}}}}}}| {{{{{{{{\bf{X}}}}}}}}}_{m}={{{{{{{{\bf{x}}}}}}}}}_{m}^{* })$$ as an informative Bayesian prior of the posterior distribution $$p({{{{{{{\bf{Y}}}}}}}}=\hat{{{{{{{{\bf{y}}}}}}}}}| {{{{{{{{\bf{X}}}}}}}}}_{m}={{{{{{{{\bf{x}}}}}}}}}_{m}^{* },{{{{{{{{\bf{X}}}}}}}}}_{u}={{{{{{{{\bf{x}}}}}}}}}_{u}^{* })$$. We can prove that:

$$\begin{array}{l}p({{{{{{{\bf{Y}}}}}}}}=\hat{{{{{{{{\bf{y}}}}}}}}}| {{{{{{{{\bf{X}}}}}}}}}_{m}={{{{{{{{\bf{x}}}}}}}}}_{m}^{* },{{{{{{{{\bf{X}}}}}}}}}_{u}={{{{{{{{\bf{x}}}}}}}}}_{u}^{* })\propto \\ p({{{{{{{\bf{Y}}}}}}}}=\hat{{{{{{{{\bf{y}}}}}}}}}| {{{{{{{{\bf{X}}}}}}}}}_{m}={{{{{{{{\bf{x}}}}}}}}}_{m}^{* })p({{{{{{{\bf{Y}}}}}}}}=\hat{{{{{{{{\bf{y}}}}}}}}}| {{{{{{{{\bf{X}}}}}}}}}_{u}={{{{{{{{\bf{x}}}}}}}}}_{u}^{* }).\end{array}$$
(1)

The implementation of the BaM3 method therefore reduces to the calculation of the aforementioned probability distributions. Although the prediction of the probability distribution function (pdf) of the clinical outputs is rather straightforward for the mathematical model, obtaining the pdf of the patient’s unmodelable data is not trivial. To retrieve the latter, we use a density estimator method upon the patient ensemble dataset to derive p(Y, Xu), and then consider the patient-specific realization $${{{{{{{{\bf{X}}}}}}}}}_{u}={{{{{{{{\bf{x}}}}}}}}}_{u}^{* }$$. For further details about method derivation and estimators of performance, see Supplementary Note 1.

### Testing the method on synthetic glioma growth

The equations of the selected mathematical model14,15 (‘full model’) describe the spatio-temporal dynamics of tumor cell density (c), oxygen concentration (n), and vascular density (v) in the context of glioma tumor growth. The full model includes the variation of cell motility and proliferation due to phenotypic plasticity of tumor cells induced by microenvironmental hypoxia 16,17,18,19,20. It also accounts for oxygen consumption by tumor cells, formation of new vessels due to tumor angiogenesis, and vaso-occlusion by compression from tumor cells15,21,22. We generate N = 500 virtual patients by sampling the parameters of the full model from a uniform distribution over the available experimental range. We consider the tumor cell spatial density c to be the modelable variable. Moreover, we treat the integral over the tissue of oxygen concentration and vascular density, denoted as $$\bar{n}\;{{{{{\mathrm{and}}}}}}\;\bar{v}$$, respectively, as the unmodelable quantities. Starting from the same initial conditions, we simulate the behavior of each virtual patient for 3 years, storing the values of all variables at each month. As sketched in Fig. 1, we use the modelable variable to setup a mathematical model. In particular, we take c(x, t0) at a specific time-point, the clinical presentation time t0, and use it as the initial condition for a Fisher-Kolmogorov equation23,24,25,26,27 (‘FK model’). We use this model to predict tumor behavior at a specific time in the future, the prediction time tp. For each simulated patient we calculate the tumor size (TS) and infiltration width (IW). In parallel, for each patient we evaluate the diagnosis time td as a random number in the interval [t0 − 6, t0 + 6] (in the unit of months), and collect the values of modelables, unmodelables, and clinical outputs at this time to build the patient ensemble. Given the patient-specific modelable and unmodelable variables (c(x, t0) and $$\bar{n},\bar{v}$$, respectively) at the clinical presentation time t0, the BaM3 method therefore produces the probability of observing the TS and IW at a specific prediction time tp.

### Mathematical models for glioma growth

The system variables are the density of glioma cells c(x, t), the concentration of oxygen n(x, t), and the density of functional vasculature v(x, t)14,15. For simplicity we consider a one-dimensional computational domain. We normalize the system variables to their carrying capacity and write the system as

$$\frac{\partial c}{\partial t}=D\frac{{\partial }^{2}}{\partial {x}^{2}}\left(\frac{\alpha }{{\alpha }_{0}}c\right)+b\frac{\beta }{{\beta }_{0}}c(1-c),$$
(2)
$$\frac{\partial n}{\partial t}={D}_{n}\frac{{\partial }^{2}n}{\partial {x}^{2}}+{h}_{1}v({n}_{0}-n)-{h}_{2}cn,$$
(3)
$$\frac{\partial v}{\partial t}={D}_{v}\frac{{\partial }^{2}v}{\partial {x}^{2}}+{g}_{1}{{{{{{{\mathcal{H}}}}}}}}(n-{n}_{0})v(1-v)-{g}_{2}v{c}^{\delta }.$$
(4)

Here $${{{{{{{\mathcal{H}}}}}}}}(\cdot )$$ is a sigmoidal function ($${{{{{{{\mathcal{H}}}}}}}}(x-{x}_{0})=1/$$(1 + exp(b*(x − x0)), with b > 0 being a constant) allowing for tumor angiogenesis in hypoxic conditions, i.e., for n < n0 where n0 is the hypoxic oxygen threshold. Then, the functions α = α(n) and β = β(n) account for the dependence of cellular motility and proliferation on the oxygen level, respectively16,17,19. They are defined as:

$$\alpha =\frac{{\lambda }_{1}-n}{({\lambda }_{2}-1)n+{\lambda }_{1}},$$
(5)
$$\beta =\frac{{\lambda }_{2}n}{({\lambda }_{2}-1)n+{\lambda }_{1}}.$$
(6)

When the oxygen level is fixed to the maximum level n = 1 in the tissue α = α0 and β = β0, so that the equation for c reduces to

$$\frac{\partial c}{\partial t}=D\frac{{\partial }^{2}c}{\partial {x}^{2}}+bc(1-c),$$
(7)

which we denote in the rest of the manuscript as the Fisher-Kolmogorov (FK) model for tumor cell density. We remark that Eq. (7) has been extensively used to predict untreated glioma kinetics based on patient-specific parameters from standard medical imaging procedures23,24,25,26,27.

Eqs. (2)–(4) define an extended version of the FK equation, enriched with nonlinear glioma cell diffusion and proliferation terms. The latter terms depend on the oxygen concentration in the tumor microenvironment, which is in turn coupled to cell density through the oxygen consumption term. The functional vascular density controls the supply of oxygen to the tissue. Blood vessel density increases due to tumor angiogenesis and decreases because of vaso-occlusion by high tumor cell density. The values of the parameters used in the simulations and their descriptions are given in Table S1. In addition, a typical full model simulation is shown in Fig. S3 for a representative patient.

We solve the system in Eqs. (2)–(4) by imposing the initial conditions:

$$c(x,0)={c}_{0}{{{{{{{\mathcal{H}}}}}}}}(x-\varepsilon )\ \ {{{{{{{\rm{in}}}}}}}}\ \ 0\,\le\, x\,\le\, L,$$
(8)
$$n(x,0)={n}_{0}\ \ {{{{{{{\rm{in}}}}}}}}\ \ 0\,\le\, x\,\le\, L,$$
(9)
$$v(x,0)={v}_{0}\ \ {{{{{{{\rm{in}}}}}}}}\ \ 0\,\le\, x\,\le\, L,$$
(10)

where the positive parameters c0, n0, and v0 are the initial density of glioma cells spatially distributed in a segment of length ε, the density of functional tumor vasculature, and the oxygen concentration, respectively. Then, L > 0 is the length of the one-dimensional computational domain. In addition, we consider an isolated host tissue in which all system behaviors arise solely due to the interaction terms in Eqs. (2)–(4). This assumption results in no-flux boundary conditions of the form:

$$\frac{\partial c}{\partial x}(0,t)=\frac{\partial n}{\partial x}(0,t)=\frac{\partial v}{\partial x}(0,t)=0,$$
(11)
$$\frac{\partial c}{\partial x}(L,t)=\frac{\partial n}{\partial x}(L,t)=\frac{\partial v}{\partial x}(L,t)=0.$$
(12)

Both the full and FK models are used to calculate two clinical outputs, namely the tumor IW and TS. The IW at a specific time is defined by the difference between the points where glioma cell density is 80% and 2% of the maximum cellular density. In turn, the TS is obtained by integrating the spatial profile of tumor density and dividing it for the maximum value of the latter.

We run the full model and simulate the growth of the tumor for N patients, each one from a parameter set taken randomly from a uniform distribution over the parameter range. We run simulations for N = 50,100, 250, and 500 with 10 repetitions within each N-case. To generate the patients, we vary five parameters in the list in Table S1, namely the tumor motility D, proliferation rate b, oxygen consumption h1, vascular formation, and occlusion rates g1 and g2, respectively. Then, we use the tumor density at the time of clinical presentation, t0, as the initial condition for the FK model. The latter model is employed to generate predictions at the prediction time tp. We also consider the unmodelable variables and clinical outputs at the diagnosis time td, taken randomly between t0 ± 6 months, to build the patient ensemble. Finally, we use the results of the full model in terms of clinical outputs as the ground truth to be compared with the predictions of the FK model alone and with the ones obtained by the BaM3 method.

### Probability distribution from the FK model

As described in the previous sections, we take the spatial profile of tumor density at the clinical presentation time t0 as the initial condition of the FK model. We use the latter mathematical model to run simulations over the whole parameter set for cell motility D and proliferation rate b. Then, we define the model-derived pdf as in the following. For each couple of clinical outputs IW* and TS* we calculate the area Aα(IW*,TS*) over the (IW,TS) plane as Aα = [(1 − α)IW* < IW < (1 + α)IW*), (1 − α)TS* < TS < (1 + α)TS*)], where α is a given tolerance (here set α = 0.05). Then, we calculate the pdf by normalizing Aα by the total area of predicted IW and TS values. We store the value of the probability for each patient at the different prediction times and use it to compute the expected value of the model pdf.

### Probability distribution of the unmodelables from the full model

To retrieve the data-derived pdf we use a normal kernel density estimator (KDE)28,29, which depends upon all the data points in the patient ensemble. Briefly, the method estimates the joint probability $$p({{{{{{{\rm{IW}}}}}}}},{{{{{{{\rm{TS}}}}}}}},\bar{n},\bar{v})$$ from which the ensemble entries are drawn through the sum of a kernel function over all the occurrences of the dataset. The kernel function is characterized by a hyperparameter, the bandwidth $$\tilde{h}$$, which we assume according to Silverman’s rule of thumb

$${\tilde{h}}_{i}={\sigma }_{i}{\left[\frac{4}{(d+2)n}\right]}^{\frac{1}{d+4}},\quad i=1,2,\ldots d,$$
(13)

where d is the number of dimensions, n is the number of observations, and σi is the standard deviation of the ith variate30. After calculating $$p({{{{{{{\rm{IW}}}}}}}},{{{{{{{\rm{TS}}}}}}}},\bar{n},\bar{v})$$, we specify the realization of a specific patient and calculate the value of $$p({{{{{{{\rm{IW}}}}}}}}={{{{{{{{\rm{IW}}}}}}}}}^{* },{{{{{{{\rm{TS}}}}}}}}={{{{{{{{\rm{TS}}}}}}}}}^{* },\bar{n}={\bar{n}}^{* },\bar{v}={\bar{v}}^{* })$$ over the (IW,TS) space of the estimated clinical outputs.

### Scoring glioma growth predictions

We calculate for each patient the relative errors dm and db as described the main text. To assess how the BaM3 method has changed the prediction of the mathematical model, we compare the latter quantities: if db − dm ≤ εdm, then there was no change; if db > (1 + ε)dm, then the method deteriorated the prediction of the model; if db < (1 − ε)dm, then the method improved the prediction of the model. Here, ε is a tolerance used for the comparison, taken to be ε = 0.05.

### Calculation of the effective variance

To calculate the effective variance s, we first calculate the mixed central moments Σij of the pdf of interest according to the formula

$${{{\Sigma }}}_{ij}=\int_{{y}_{1}}\int_{{y}_{2}}({y}_{1}-{\mu }_{1})({y}_{2}-{\mu }_{2})f({y}_{1},{y}_{2})\ d{y}_{1}d{y}_{2},$$
(14)

where y1 and y2 are the clinical outputs (IW and TS, respectively) and μ1, μ2 the expected values of the corresponding variables. The elements of Σ form a symmetric two-dimensional matrix, for which we calculate the determinant. We define the effective variance s as the natural logarithm of the latter determinant. In Eq. (14) we consider f(y1, y2) to be the pdf from the mathematical model or from the BaM3 method depending on whether we are interested in the effective variance sm or sb, respectively.

### Pdf from the two-compartment model in CLL

Messmer and colleagues31 measured the fraction of labeled B-CLL cells in a cohort of 17 CLL patients that were administered deuterated water. They calibrated a two-compartment model on each patient and were able to reproduce the kinetics of labeled cells over a long time. We adopt their model and use it to generate the pdf for the CLL example. The fraction of labeled cells over time is calculated through the expression

$$f(t)=h(t)+f(0){e}^{\frac{-bt}{{v}_{r}}}+\frac{{e}^{\frac{-bt}{{v}_{r}}}\left(g(0)-h(t)\right)}{{v}_{r}-1}+\frac{{e}^{-bt}\left(h(t)-g(0)\right)}{{v}_{r}-1}$$
(15)

where g(0) is the initial fraction of cells in the first compartment, b the fractional cell birth, vr the relative size of the compartments, and h(t) is the deuterated water concentration of the body over time. The latter is a function of the fractional daily water exchange fw. We refer the interested reader to the supplementary information of Messmer et al.31 for a more detailed description of the model and a full account of the model parameters. In this work we focus on three quantities, namely b, vr, and fw, and run the model in Eq. (15) over the experimental range. This range was obtained by considering the patient-specific fitting performed by Messmer and colleagues and selecting the minimum and maximum values. We evaluate the fraction of labeled cells at day 50, f50, and build the probability distribution from its histogram, by counting the number of occurrences of a given $${f}_{50}^{* }$$ for $$\min ({f}_{50}) < {f}_{50}^{* } \, < \, \max ({f}_{50})$$ and then normalizing the result. For the CLL example, all the patients start with the same initial fraction of labeled cells, set to zero.

### Pdf from the patients’ unmodelables in CLL

The data-derived pdf in the CLL example is obtained from four unmodelable quantities that are measured for each patient during the study. We consider all the possible combinations of unmodelables and calculate the mean squared error (MSE) for each case. The scatter plot in the same picture refers to the case in which the CD38 expression (xu,1), age (xu,2), growth rate of white blood cells (xu,3), and VH mutation status (xu,4) are added consecutively with the specified order. As in the glioma example, we build the sub-dataset (y, xu), where y and xu = (xu,i) are the f50 and unmodelable variables of each patient, respectively, and apply the KDE using Silverman’s rule for the hyperparameters. The requested pdf, i.e., $$p(Y=\hat{y}| {{{{{{{{\bf{X}}}}}}}}}_{u}={{{{{{{{\bf{x}}}}}}}}}_{u}^{* })$$ is obtained by conditioning the probability from the KDE with the realizations of the unmodelables of the specific patient and calculating the result over the range of the estimated clinical output $$\hat{y}={f}_{50}$$.

### Mathematical model for ovarian cancer

We assume the total number of tumor cells T to be composed of the sensitive S and resistant R subpopulations. The latter are described by the following system of ordinary differential equations (ODEs):

$$\dot{S}=\gamma S-\delta S-\tau S,$$
(16)
$$\dot{R}=\gamma R-\lambda \delta R+\tau S,$$
(17)

where γ is the tumor net growth rate, δ = δ(t) is the death rate induced by chemotherapy, τ is the mutation rate from sensitive to resistant cells, and λ is a factor that accounts for reduced death by therapy in resistant cells. As detailed in Fig. S12, the treatment is composed of three phases: first, the patients undergo different cycles of NACT; then, surgery is performed. The latter reduces the total tumor volume, irrespective of cells being sensitive or resistant, of a factor β. Finally, another series of chemotherapy cycles is performed. During chemotherapy, δ = δ0, whereas we set this parameter to zero after chemotherapy and until tumor relapse. The latter condition occurs when T reaches the value TR.

Equations (16) and (17) can be analytically integrated, and their results used to build the probability distribution of the clinical output—TtR, in this case. To obtain the pdf from the model, we calculate the time the tumor takes to reach the cell number at relapse TR starting from the cell number after therapy. We perform this calculation using the initial tumor cell number of each patient, and by varying both the initial fraction of S cells, x0, and the chemotherapy-induced death rate, δ0. We then obtain the patient-specific probability distribution from the histogram of TtR, similarly to what is done in the previous section for CLL. For x0, we select a range between 0.4 and 0.9, accounting for tumors with different initial degrees of intrinsic resistance32. For δ0, we first use a uniform distribution between 0.1 and 10 days−1, accounting for a wide variation in death rates. The latter choice produces an almost flat distribution for the clinical output (see Fig. S13). To improve the mathematical model parametrization, we use the information about the tumor volume change after the first cycle of chemotherapy, which is included in the dataset. By fitting T obtained from Eqs. (16) and (17) to the observed volume change, we find a value of δ0 for each patient in the dataset32. We take the mean value of these rates and use it to update the model pdf (see Fig. S14). We consider a range for δ0 that is centered around its mean value across the patients, within an interval of ±40%. Selecting other ranges provides similar results, however, a variation of 40% returns the lowest MSE. Analytical integration of Eqs. (16) and (17), as well as additional details about model parametrization are available in Supplementary Note 2.

### Unmodelable variable for the ovarian cancer study

We build the data-derived pdf for the ovarian cancer example by exploiting the information about the age of the patients at diagnosis. Similarly to what done in the previous test cases, we first build the sub-dataset (TtR, A) by entering the information of each patient (here, A is the patient age). Then, we apply the KDE using Silverman’s rule to estimate the bandwidth and calculate the joint probability $$p({{{{{{{\rm{TtR}}}}}}}},A)$$. The data-derived pdf for each patient $$p({{{{{{{\rm{TtR}}}}}}}}| {A}^{* })$$ is finally obtained over the domain of the clinical output TtR by considering the patient-specific age A = A*.

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

## Results

We introduce the key ideas of the proposed methodology in the context of brain tumor growth, leaving the full derivation of the equations and their general form to the Supplementary information (see Supplementary Note 1).

Gliomas are aggressive brain tumors generally associated with low survival rates33. One of the most important hallmarks of this type of tumors is its invasive behavior, combined with a marked phenotypic plasticity and infiltrative morphology16. The clinical needs led to the development of several mathematical models to support clinicians in the treatment of the disease34. As a first test case, we synthetically generate a dataset of glioma patients using a system of recently published14,15 partial differential equations (PDEs). This complex mathematical model (‘full model’, in the following) provides a set of in silico patients, which represents our synthetic reality and serves as a benchmark to evaluate the performance of the proposed BaM3 method. Our goal is to obtain a personalized prediction of the clinical observables of the patients, combining their ‘modelable’ and ‘unmodelable’ information. A simplified mathematical model (with respect to the full model used to generate the patients) is used to generate predictions of clinical outputs starting from the modelable variables. In turn, a machine learning algorithm produces predictions of the same clinical outputs leveraging on the information contained in the unmodelables. As displayed in Fig. 2, the core idea of the BaM3 method is to use the results of the mathematical model to guide the predictions of machine learning. In more technical terms, the pdf obtained from the mathematical model (‘model-derived pdf’, in the following) works as a Bayesian prior that multiplies the pdf obtained from a nonparametric regression algorithm (‘data-derived pdf’). The product of these two pdfs returns an estimate of the pdf for the clinical outputs of interest. More details about the formal definition of the BaM3 method and the mathematical details are available in the ‘Methods’ and Supplementary Note 1.

### Improving predictions of synthetic glioma growth

For this first test case, we deal with two clinical observables, i.e., the TS and IW. The first quantity is related to tumor burden, whereas the second accounts for tumor infiltration in the host tissue. The modelable variable is the tumor cell density c, whereas we consider the amount of oxygen $$\bar{n}$$ and vasculature $$\bar{v}$$ in the tissue to be the patients’ unmodelables (see the ‘Methods’). Then, given the patient-specific modelable and unmodelable variables at the clinical presentation time t0, the BaM3 method produces the probability of observing certain values of the clinical outputs at a specific prediction time tp (see Fig. 2). The data-derived pdf is obtained through a normal KDE28,29,29,35 incorporating the information about the patient ensemble. The latter is generated from the full model, at the diagnosis time td. Then, the model-derived pdf is calculated using the simplified mathematical model for each patient. In particular, we use the FK model23 to produce a map of possible IW and TS starting from the tumor cell density of each virtual patient (see the ‘Methods’ for further information about the KDE and modeling steps).

Figure 3a–c shows the results of applying the BaM3 method to a representative patient. We select a clinical presentation time t0 = 24 months and a time of prediction tp = 9 months. The model-derived pdf obtained from the FK model is shown in Fig. 3a. Interestingly, the prediction of the model in that particular case shows two peaks, one with low TS and high IW and another with opposite properties. We calculate the expected values of TS and IW from the pdf obtained with the FK model and compare it to the ‘true’ values given by the full model. As shown in the plot, for this patient the presence of a bimodal distribution shifts the expected values far from the true ones. We enforce the BaM3 method making use of the probability calculated from the KDE, shown in Fig. 3b. The latter pdf takes into account the correlations between the clinical outputs and the unmodelable variables present in the patient ensemble. For this patient, the unmodelable distribution selects the probability mode closer to the true IW and TS values, as displayed in Fig. 3c (another example for a different patient is given in Fig. S4). These results evidence the ability of the proposed method to correct the predictions obtained by using exclusively the mathematical model, and to produce an expected value of the pdf that is closer to the ground truth.

We apply the BaM3 method for two clinical presentation times at 12 and 24 months, and compare its outcomes with those provided by the full model at increasing prediction times (Fig. 3d, i). For each patient, we calculate the relative error between the predicted clinical outputs obtained from the full model and the expected values of the pdf calculated from the FK model (dm) and after implementing the BaM3 method (db). These nondimensional errors are calculated as

$${d}_{k}=\sqrt{\frac{1}{L}\mathop{\sum }\limits_{i=1}^{{N}_{y}}{\left(1-\frac{\langle {y}_{i}^{k}\rangle }{{y}_{i}^{r}}\right)}^{2}}$$
(18)

where k = m, b, $$\langle {y}_{i}^{k}\rangle$$ are the expected observable values (i = IW, TS) calculated from the FK model and the BaM3 method, and $${y}_{i}^{r}$$ are the observable values obtained from the full model. We calculate the errors dm and db for each patient at different prediction times. Then, we compare the corresponding errors and evaluate if the BaM3 method improved, deteriorated, or left unchanged the prediction from the FK model, i.e., db < dm, db > dm, or db ~ dm, respectively (see the ‘Methods’). We denote the ratio of improved, unchanged, and deteriorated cases with respect to the total number of simulated patients as Si, Su, and Sd, respectively.

Both the relative errors dm and db increase for increasing prediction times, as shown in Fig. 3d, g for the two clinical presentation times considered. However, after applying the BaM3 method the errors decrease, especially at later times. In general, it is possible to notice an improvement both in terms of median values and sparseness of the data. Interestingly, the relative error obtained from the BaM3 method increases at a lower rate if compared to the relative error obtained from the FK model.

We also calculate the effective variance of the predictions as the logarithm of the determinant of the covariance matrices relative to the model and BaM3 pdfs, (identified by sm and sb, respectively; see the ‘Methods’). This quantity reflects the spreading of the pdfs over the (TS, IW) plane, with higher values denoting more uncertainty in the predictions. For both clinical presentation times (Fig. 3e, h), the BaM3 method provides thinner pdfs, more centered around their expected value with respect to the FK model-derived case.

Finally, the stacked bars in Fig. 3f, i show that BaM3 performs well at later prediction times, and especially remarkably well (improvement ratio Si close to 1) at the latest clinical presentation time t0 = 24 months. For t0 = 12 months (Fig. 3f), the proposed method is not able to improve predictions until a prediction time of 6 months. Then, for tp = 6, 9, and 12 months the advantages of using BaM3 over the FK model are unambiguous. On the other hand, for the clinical presentation time of 24 months (Fig. 3i) both Su and Sd decrease significantly for prediction times equal to or greater than 3 months. The ratio of improved cases Si reaches almost 100% at each of the last three prediction times, clearly overcoming the results of the FK model. The error bars in Fig. 3f, i denote the variability in the results that is obtained by replicating the study 10 times, each with N = 500 randomly generated patients. Fig. S5 shows similar results when decreasing the number of patients. Notably, the scores for N = 500, 250, 100, and 50 are very close, slightly improving with increasing the number of patients. The variability in the 10 replicates also decreases for higher values of N.

We also calculate the prediction scores using the distribution mode to generate the scores instead of the expected value (see Fig. S6). When the pdfs display multiple maxima we consider the average of the relative errors between the values of the full model (i.e., the synthetic reality) and the different peaks. The performance of the BaM3 method sensibly degrades with respect to using the expected value. Improvement in predictions is observed only for later clinical presentation and prediction times.

In summary, the BaM3 method is able to correct the FK model predictions for most of the patients, particularly at later clinical presentation and prediction times. The improvement in the prediction occurs by: (i) decreasing the median relative error between expected observable values and ground truth; (ii) decreasing the rate at which the error increases with prediction time; and (iii) decreasing the variance associated with the probability distributions.

#### BaM3 performance depends on the clinical output

Even though the BaM3 method performs well for the majority of patients, there are some cases for which it fails to improve the predictions of the mathematical model. We analyze the failure cases by splitting the errors dm and db into the two partial errors

$${{\Delta }}{{{{{{{{\rm{IW}}}}}}}}}_{k}=\sqrt{{\left(1-\frac{\langle {{{{{{{{\rm{IW}}}}}}}}}^{k}\rangle }{{{{{{{{{\rm{IW}}}}}}}}}^{r}}\right)}^{2}},\quad {{\Delta }}{{{{{{{{\rm{TS}}}}}}}}}_{k}=\sqrt{{\left(1-\frac{\langle {{{{{{{{\rm{TS}}}}}}}}}^{k}\rangle }{{{{{{{{{\rm{TS}}}}}}}}}^{r}}\right)}^{2}}$$
(19)

where k = m, b, 〈IWk〉, and 〈TSk〉 are the expected values of the clinical outputs obtained from the mathematical model and BaM3 pdfs (k = m and b, respectively), and IWr and TSr are the values of these quantities from the full model. Figure 4 shows how these partial errors are distributed over the presentation and prediction times. The dashed line in the plots highlights the neutral boundary, where the partial errors of the FK model and BaM3 method are equal. Above this line, the proposed BaM3 method deteriorates the model predictions, whereas under that line the BaM3 method improves predictions. The red dots in the scatter plots represent the patients for which the BaM3 method fails (‘failure cases’, in the following). After a prediction time of 1 month, in which a characteristic pattern is not evident, the plots highlight that failure cases are generally associated with regions where the BaM3 method under-performs to the FK model with respect to TS (ΔTSb > ΔTSm). Interestingly, the same failure cases belong to regions in which ΔIWb < ΔIWm: the BaM3 method is improving the IW predictions and at the same time deteriorating the TS predictions. This happens for both t0 = 12 and 24 months, however, the number of failure cases is considerably higher for the earlier presentation time. For the specific case under consideration, lower performance of the BaM3 method is therefore associated to its inability in correcting the FK model predictions for TS, with a tendency that improves for the later presentation time due to strong corrections for IW.

#### Transient behavior of the unmodelable distribution is associated to limited improvements

To investigate the reasons for the poor performance of the BaM3 method in improving the predictions for one of the clinical observables, we analyze the behavior of the pdf arising from density estimation, i.e., the data-derived pdf. Figure 5 shows the temporal evolution of this quantity for different clinical presentation times t0. Figure 5a shows a plot of the unmodelable pdf for a representative patient over the clinical output space. From a pdf that covers a limited region in the (IW,TS) plane, the probability distribution spreads over a broader area as the presentation time increases. The center of mass of the distribution, however, tends to converge to a more specific region as time progresses. This is more evident in Fig. 5b, c, showing the marginal probabilities for IW and TS, calculated from the distribution in Fig. 5a. The marginal distributions become broader for both IW and TS, but in the first case their peak stabilizes at later t0 times. On the contrary, the peak of the marginal probability for TS moves towards larger values at higher times. To quantify this behavior across the different patients, we then evaluated the degree of overlap between the marginal probabilities at two subsequent t0. Results from this calculation are plotted in Fig. 5d, e for the overlap between the distributions at presentation times t0 of 12 and 18 months and between 24 and 30 months. Here, the degree of overlap is calculated as the area of overlap for the IW and TS marginal distributions. Values close to one represent maximum overlap, whereas values near zero are associated to poor overlap between the two marginal pdfs. In a rough approximation, when this overlap score is high the marginal pdf is close to a steady state (since the pdf has not moved over time), and vice versa. For the earlier times in Fig. 5d, the patients are mostly scattered along a line of increasing IWr and TSr with points where the overlap is poor (close to 0.4 in certain regions). On the other hand, for the later times in Fig. 5e the patients are shifted towards higher values of overlap. Moreover, a horizontal line of high overlap for the IW output is visible for a large patient ensemble, pointing to a stabilization towards a steady state for the IW at later presentation times. This explains the lower performances of BaM3 method at t0 = 12 months, since the pdf from the KDE that should correct the model predictions is projecting the model pdf over (IW,TS) values that are outdated, far from the steady state. The situation improves for the case of t0 = 24 months; even though the correction of the BaM3 method for the TS might be wrong is some cases, the pdf for the IW has stabilized and points towards the correct value. In most of the cases, the correction for the IW outperforms the one for the TS, which leads to a general improvement of predictions by the BaM3 method.

#### Outlier patients challenge the method’s performance

To explain the different behavior for IW and TS, we investigate the distribution of the failure cases over the full model parameter space. In general, the BaM3 method performs poorly for those patients that are at the extremes of the parameter space, who represent outlier patients. When plotting the patients in a scatter plot over cell motility and proliferation rate (Fig. 6), the points with high motility–high proliferation rates and high motility–low proliferation rates witness the highest number of failure cases for both clinical presentation times t0 of 12 and 24 months. We checked for the distribution of failure cases also for the other model parameters, but no particular pattern was evident (see Fig. S7). Notably, patients falling into these high motility–high/low proliferation regions show the highest values for IW and TS (see ref. 14 and Fig. S8). Highly invasive and massive neoplasms are inadequately described by the pdf from the KDE, as they represent the extreme cases of the probability distribution. As a result, the FK model performs better in predicting the clinical outcomes with respect to the BaM3 method, since in the latter the correction from the dataset points towards smaller values of IW and, especially, TS.

### Applying the method to real CLL patients: the effect of unmodelables

In addition to the proof-of-concept applied to in silico data, we test the BaM3 methodology on a cohort of real patients suffering from CLL. This cancer involves B cells and is characterized by the accumulation of lymphocytes in the blood, bone marrow, and secondary lymphoid tissues36. In the past, CLL was considered to be a homogeneous disease of minimally self-renewing B cells, which accumulate due to a faulty apoptotic mechanism. This view was questioned by recent findings, suggesting a more heterogeneous neoplastic population continuously interacting with its microenvironment 37,38,39,40. Accumulation of leukemic cells occurs because of survival signals originating from the external environment and interacting with leukemic cells through a variety of receptors. The nature of this cross-talk with the environment is a current matter of research, featuring in vitro as well as in vivo experiments. One of the most significant experiments involving human patients was that of Messmer et al.31. Messmer and his co-workers inferred the kinetics of B-CLL cells from a group of patients through non-invasive labeling and mathematical modeling. Their investigation was quite thorough and involved the collection of several quantities related to patients’ personal data (gender, age, etc.) and status of the disease (years since diagnosis, treatments, mutation status, etc.). They measured the fraction of neoplastic labeled cells in the blood of the patients, and fitted an ODE compartmental model to the dynamics that they observed. The model included three parameters, i.e., the daily water exchange rate (fw), the B cell birth rate (b), and the relative size of the blood compartment (vr).

We use the same model as Messmer and colleagues as the input for the BaM3 method, but discard the patient-specific fitting provided in their publication. Our aim is to show that, even when an individualized model parametrization is unknown, coupling the information given by the unmodelables can provide good patient-specific predictions. To accomplish this, we run simulations over uniform parameter ranges to obtain the pdf of the labeled cell fraction at day 50 (f50), which is also the sole modelable variable in this dataset (see Fig. S9). Then, we incrementally select one to four unmodelable variables from the patients’ dataset and build the data-derived pdf using the same KDE method as in the previous in silico example. The BaM3 method couples the two prediction distributions to obtain the pdf for the clinically relevant output (see Fig. S10). We show the results of this procedure in Fig. 7a, where we compare the BaM3 predicted values against the patient f50 values reported in ref. 31. The fraction of labeled cells predicted by the BaM3 method agrees well with the reported data, especially when we increase the number of unmodelables used for density estimation. The inset shows how the MSE of BaM3 predictions decreases after considering all the possible combinations of unmodelables. Figure 7b shows how the probability distribution generated from the KDE changes for a representative patient. As the number of unmodelables increases, the mode of the distribution shifts towards the correct value of f50, here denoted by a red dashed line. From Fig. 7a it is also possible to note that, even if the majority of points lies close to the perfect prediction line, the predictions of a few patients are significantly mismatched with respect to the corresponding real values. This occurs because these patients belong to the extremes of the parametric space (see Fig. S11). Patients characterized by outliers in their parametriziation are under-represented in the modelable pdf due to the uniform sampling of the parameter space, and it is challenging for the data-derived correction to improve predictions for them.

### Prediction of the time-to-relapse on a real ovarian cancer patient cohort: the importance of adequate model parametrization

To provide another application of the BaM3 method to a real scenario, we consider the case of patient response to therapy in high-grade serous ovarian cancer (HGSOC). This type of cancer is the most common epithelial ovarian cancer subtype, accounting for 70–80% of ovarian cancer-related deaths41,42. In addition, due to treatment resistance, the 5-year survival rate in HGSOC is less than 50%43,44. Indeed, the contribution of resistance mechanisms to tumor relapse after therapy is currently an active matter of research, recently backed up by evolutionary studies 45,46,47.

We start from the clinical dataset provided in a recent publication32, and elaborate a strategy to predict the TtR in ovarian cancer patients that makes use of the BaM3 methodology. The database of patients consists of 20 individuals, which are subject to the following treatment schedule (see Fig. S13). First, the patients receive neoadjuvant chemotherapy (NACT), consisting of different cycles of carboplatin and paclitaxel chemotherapy. Then, a surgery is performed, followed by other cycles of adjuvant chemotherapy. We propose a low-dimensional mathematical model to predict tumor TtR after treatment for each patient, which takes into account the presence of two cell subtypes. In particular, we include cells that are sensitive or resistant to chemotherapy. In addition, we consider the age of the patient at diagnosis as the unmodelable quantity used by the density estimator. Full details of the model and methodology are available in the corresponding sections of the ‘Methods’ and Supplementary Note 2.

As in the previous sections, the pdf from the mathematical model is obtained by simulating the latter over the parameter space. In this case, we focus on two parameters, namely the initial fraction of sensitive cells x0 and the death rate induced by chemotherapy δ0. First, we consider a uniform distribution of both parameters. We assume x0 between 0.4 and 0.9, in agreement with the degree of variability reported in the publication from which we take the dataset32. Since we lack any information about δ0, we select a wide range, from 0.1 and 10 days−1. This results in an almost uniform pdf from the mathematical model, as shown in Fig. S13. In this condition, the pdf from the model enters the BaM3 as an uninformative prior in the Bayesian framework, leaving predictions to rely only on the pdf generated from the density estimation of the unmodelables. Note that, in these settings, BaM3 reduces to nonparametric regression29. We calculate the MSE using the mode of the distributions in the uninformative case, denoted as MSEun, and find MSEun = 38.901 months2. As a next step, we use the additional information provided in the dataset to improve the parametrization of the mathematical model. Indeed, the dataset reports the tumor volume before and after the first cycle of therapy, as measured from clinical imaging32. We fit the value of δ0 for each patient and take the mean of all these values as the center of another uniform distribution. We apply the BaM3 method using the newly generated pdf from the model and obtain a lower MSE, i.e., MSEfit = 30.895 months2 (see Fig. S14). Better parametrization results, therefore, in improved performance of the method. In addition, by applying BaM3 to a better parametrized model allows to obtain improved predictions, as shown in Fig. 8. The scatter plot in Fig. 8a shows reduced errors in BaM3 predictions with respect to the ones from the mathematical model or density estimation alone. Also, Fig. 8b displays the outcome of the method for two representative patients. In both cases, the pdf arising from BaM3 has its mode closer to the real TtR (dashed line), with respect to the modes obtained from the model or density estimation pdfs. This shows the potential of the BaM3 method, which is able to perform better than the single techniques upon which it is based.

## Discussion

In the last few years, mathematical modeling and machine learning have emerged as promising methodologies in the biomedical field 48,49,50. However, several challenges persist and limit the prediction accuracy of both approaches. Among these issues, we identified the lack of knowledge of the mechanisms that govern the system under study (C1), and the paucity of time points at which patient information is available (C2), to significantly limit the performance of both mathematical models and machine learning techniques. In this work, we presented a method (BaM3) to couple mathematical modeling and density estimation in a Bayesian framework. The goal of BaM3 is to improve personalized tumor burden prediction in a clinical setting. This coupling allows to address the aforementioned (C1) and (C2) challenges, by exploiting the strengths of the respective methodologies and integrating them in a complementary path.

In particular, our proposed method aspires to solve a dire problem in personalized medicine that is related to the limited time-points of patient data collection. This implies that data assimilation methods, such as Kalman filters or particle filters51,52 that require multiple data time-points integrated to a mechanistic model cannot be generally used. To this regard, the BaM3 method can be regarded as a one-step data assimilation method. Compared to other methodologies that combine outputs from mathematical models and measured data—such as Bayesian Melding53, History Matching54, Bayesian Model Calibration55, or Approximate Bayesian Computation56—the BaM3 method is not interested in parameter estimation to better calibrate the mathematical model. Instead, its goal is to improve predictions of mathematical models empowering them with knowledge from variables that are not usually considered (the ‘unmodelables’, in our framework). This is done without the exact knowledge of the parameters of the mathematical model; indeed, we calculated the pdfs of the modelable variables using a uniform sampling of the parameter space. Better estimation of the model parameters improves the outcomes of the method (as shown in the ovarian cancer test case), but it is not required for the methodology to be applied.

First, we tested the BaM3 method on a synthetic dataset of patients focusing on tumor growth dynamics. Our approach was able to improve the predictions of a FK model for the majority of the virtual patients, with significant improvements at later clinical presentation times. In addition, we tested the proposed methodology on two clinical datasets related to cancer, concerning tumor growth in leukemia and ovarian cancer patients. We compared the outcomes of the BaM3 method to the reported data and found excellent predictive capability. When analyzing the cases for which the performance of the BaM3 method was not optimal, we came across some limitations that should be addressed when applying the methodology to real cases.

The first limitation regards the selection of the proper unmodelable variables. These are quantities that cannot be easily mathematically modeled, but can be correlated to the patient clinical outputs. For our proof-of-concept we selected only a few unmodelables, but in principle multiple quantities could be considered at the same time. Moreover, the most important unmodelables could be selected in a process of feature selection similar to the ones usually adopted in machine learning, providing better accuracy for the predictions57,58. We note as well that the method is open to progress in knowledge: should an unmodelable variable become modelable because of an increased understanding of the biological mechanisms, this variable can change side and become modelable.

One should also propose an adequate mathematical model that describes the dominant dynamics of the disease, as shown in the last case for ovarian cancer. A better parametrization of the model facilitates the work of density estimation, considerably reducing prediction errors. Not only better model parametrizations, but also mathematical models that encompass a suitable amount of mechanisms about the phenomenon that is modeled are advocated. In the case of ovarian cancer, we show in Supplementary Note 2 that a simplified model (with respect to the two cell populations presented in the ‘Methods’) is not able to provide good predictions when used in the context of the BaM3 method (see also Fig. S15).

Care must be taken with the selection of the metric that should be improved by the BaM3 method. For the in silico case, for example, considering the expected value of the final pdf resulted in better method performance when compared to selecting the pdf mode (see Figs. 3 and S5). This was probably due to the very similar natures of the FK and full models. Indeed, for lower clinical presentation times the FK model is already ‘primed’ towards the correct solution (in terms of outcomes of the full model); applying the BaM3 method might result in adding noise to the FK prediction, degrading the final prediction. However, for some patients the FK model provides pdfs with multiple local maxima, sometimes far away from the full model values. In these instances (see Fig. S3), the BaM3 method is able to correct for the correct mode, shifting the pdf to the correct values. Therefore, a good practice would be to try multiple pdf metrics and test the BaM3 method on each of them. This would result in a more thorough understanding of the problem, eventually allowing for better predictions.

Another important issue is that the correlations between unmodelables and clinical outputs should be persistent over time, evolving on a timescale that is faster than the dynamics of the problem. In our synthetic dataset this was partially accomplished at later clinical prediction times, especially for the case of the tumor infiltration width. Indeed, the unmodelable variables need to provide as much time-invariant information as possible on the clinical output variables, implying an equilibrated pdf. Such data can be, for instance, from genetic origin (such as mutations) or from other variables with slow characteristic evolution time. We stress that it is the probability distribution of the unmodelables that has to be close to equilibrium; note that this does not require the value of the unmodelable variables to reach a constant value but the values should be drawn from a steady-state distribution.

We see room for improvement also concerning the selection of the density estimation method. We adopted a well-known form of nonparametric estimation through kernel density estimation, but other approaches could be tailored to a specific problem—especially when high-dimensional datasets come into play59. Moreover, introducing density estimation methods to be able to integrate categorical variables would greatly benefit the technique, especially in biomedical problems (e.g., it would be extremely beneficial to include the grade of a tumor, or the particular sequence of therapies that a patient has undergone). The modularity of the BaM3 method makes it extremely versatile, allowing one to change the density estimation step, the modeling part, or both of them at the same time to improve the final prediction scores.

Care should also be taken to generate pdfs that are able to cope with outliers. In our proof-of-concept we generated the probability distributions considering the same weight for every patient, irrespective of his position in the parameter space. Techniques able to identify these extreme cases and to improve their contribution to the final pdfs should be implemented for a better method performance60.

In summary, we can identify three main actions that could be undertaken when these limitations hamper the predictive capabilities of BaM3: (i) one should look for ways to improve the mathematical model, designing it to be as informative as possible; (ii) then, an effort should be put to constrain the model by a robust choice of parameters; finally, (iii) extreme care should be devoted to the selection of the most informative unmodelable variables.

We conclude by stating that the proposed method is not restricted to oncology. The core problem concerning clinical predictions is that data are heterogeneous and sparse in time along with lack of full mechanistic knowledge. Therefore, a vast variety of medical problems could be addressed by using the BaM3 approach. For instance, predicting the fate of renal grafts by using pre- and post-transplantation data is a prime application of our proposed methodology.