Computationally Efficient Modelling of Stochastic Spatio-Temporal Dynamics in Biomolecular Networks

Measurement techniques in biology are now able to provide data on the trajectories of multiple individual molecules simultaneously, motivating the development of techniques for the stochastic spatio-temporal modelling of biomolecular networks. However, standard approaches based on solving stochastic reaction-diffusion equations are computationally intractable for large-scale networks. We present a novel method for modeling stochastic and spatial dynamics in biomolecular networks using a simple form of the Langevin equation with noisy kinetic constants. Spatial heterogeneity in molecular interactions is decoupled into a set of compartments, where the distribution of molecules in each compartment is idealised as being uniform. The reactions in the network are then modelled by Langevin equations with correcting terms, that account for differences between spatially uniform and spatially non-uniform distributions, and that can be readily estimated from available experimental data. The accuracy and extreme computational efficiency of the approach is demonstrated on a model of the epidermal growth factor receptor network in the human mammary epithelial cell.


Results
Langevin equation with spatial heterogeneity. Consider the generic model of a ligand-receptor interaction network in a well-mixed solution given by the following five reactions: where R, L and C are the numbers of receptors, ligands and ligand-bound-receptor complex molecules, respectively, F(t) is the external source of ligand stimulation, Q R is the intra-cellular receptor generation rate, and k i for i = {on, off, e, t} is the kinetic rate of each chemical reaction. k on is the ligand-receptor binding rate, k off is the dissociation rate, and the complex and receptor are internalised with the rates, k e and k t , respectively. All of the kinetic rates given above are typically assumed to be constant. The probability that a ligand-bound complex (C) is produced from the chemical reaction between the receptor (R) and the ligand (L) in some small time interval, dt, is given by k on RLdt. This is called a propensity function and the propensities for the other five reactions are defined similarly. Now, the probability of the number of receptor molecules being reduced by one, P(dR = −1), during the time interval, dt, for the well-mixed condition is where dR is the increment in the number of receptors for the time interval, dt. Similarly, the probability of the number of receptor molecules increasing by one, P(dR = 1), for dt is The probability that neither of the above two reactions occurs is where terms involving higher orders of dt are considered negligible. The well-mixed condition considered above is a strong assumption, since it is known that heterogeneous spatial distributions can have significant impact on the responses of biomolecular networks. As illustrated in Fig. 1, the numbers of both receptor and ligand molecules would vary considerably across different regions of the cell surface, and thus the equations derived in (2), (3) and (4) would potentially have large differences from quantities measured by experiments. Typically, the reaction rates, k i , would be measured in a well-mixed condition in vitro, or calculated from some theoretical derivations, which most likely neglect spatial effects. To model the effects of spatial heterogeneity while keeping mathematical and computational complexity low, we derive a version of the Langevin equation with spatial fluctuations, as follows.
We assume that the heterogeneous space shown in Fig. 1 can be virtually divided into several subvolumes within which the well-mixed condition is valid. Then, the probabilities given above are modified as follows: where R A and L A are the number of receptors and ligands in the subvolume-A, the numbers in the other subvolumes are defined similarly, and the total numbers are given by Note that the kinetic constant, k on , would be measured in a well-mixed condition in vitro. To correct (2) to match with (5), we substitute (6) into the RL term of (2), i.e., Equation (5) is then rewritten using (7) as follows: where the correction term δ is given by By definition, δ is between 0 and 1, where δ = 0 corresponds to the well-mixed condition and δ = 1 corresponds to a complete separation of the molecules, i.e., no reaction occurs. In most biomolecular networks δ would take some value between 0 and 1, and, since the spatial distributions of the molecules might be continuously changing, the value of δ would also change over time.
The proposed modelling approach thus provides a compact parameter, δ, that captures spatial effects on the stochastic dynamics of a biomolecular network, without requiring the introduction of complex reaction diffusion equations. Note that, in principle, unimolecular reactions would not be affected by spatial heterogeneity, and thus Equation (3) remains the same for both the cases of a well mixed and non-uniform distribution. Only the chemical reactions involving two or more molecules will be affected by spatial heterogeneity.
The expected value of dR 2 is calculated as on off and the expected value of dR is given by where E(·) is the expectation over the sampling space. The variance of dR is obtained by where all dt 2 terms are negligible. Finally, the Langevin equation is derived by matching the mean and the variance given by (11) and (12), so that where dw R is the Brownian motion with variance equal to dt . The correction term, δ, would have its own stochastic temporal kinetics, for example, where a δ and b δ are unknown functions that determine the temporal evolution of δ, and dw δ could be the Brownian motion. In some limited cases, the exact or approximate form of the two functions could be modelled. In general, however, we would use experimental data to identify the structures and/or parameters in the functions determining the temporal evolution of δ. For example, if we assume that a δ and b δ are constant, then system parameter identification algorithms could be used to determine the constants 12 . This process is described in detail in the following example.

Example: Epidermal Growth Factor Receptor (EGFR) network in human cells. A model of the
ligand-receptor network kinetics of the human mammary epidermal growth factor receptor (EGFR) in the well-mixed state was developed in 13 , where the following set of ordinary differential equations are derived from the underlying chemical reactions, (1), as follows: Consider an initial impulse input for the ligand, i.e., L(0) = 10,000, assuming the presence of 200,000 receptors and no complexes. F(t) is set to zero, i.e, no external stimulation. The simulation results are shown in Fig. 2. In Fig. 2A,C, stochastic simulations with the well-mixed condition are performed using Gillespie's direct method, which is an exact stochastic simulation algorithm 14 . As expected, for the well-mixed condition, the 10 stochastic simulations of the model produce trajectories that all closely match the deterministic trajectories produced by solving the ordinary differential equations numerically.
However, even for the case where the well-mixed condition is still valid, the simulations including spatial diffusion produce different results from those generated by the ordinary differential equations or Gillespie's direct method, as shown in Fig. 2B,D. Here, the ligand and the receptor are uniformly distributed on the surface of the cell, which is assumed to be a sphere of the radius 100 μm. Since Gillespie's algorithm cannot take into account spatial effects, we solve the reaction-diffusion equations using MesoRD (Mesoscopic Reaction-Diffusion Simulator) 11 , a numerical solver for stochastic reaction diffusion equations using the next subvolume method.
In reality, it is highly probable that there would be significant differences in the time history of molecular number changes for non-uniform distributions compared to the case of a uniform distribution. To investigate this, a spatially non-uniform distribution is simulated in Fig. 3. Here, the ligands, indicated by the red dots in the figure, are distributed uniformly on the cell surface, while the receptors (not shown in the figure for clarity) are distributed non-uniformly on the cell surface. In this example, we make the distribution of the receptors approximately equal to the distribution of the complexes at t = 30 s in the figure, i.e., 99% of the total receptors are placed on the right side of the cell surface and 1% are distributed in the remaining area of the cell surface.
The time histories of the ligand and the complex numbers are compared with those calculated by the deterministic simulation in Fig. 4. The dynamic behaviours for both cases are clearly very different, showing that the explicit consideration of spatial heterogeneity is essential in order to correctly capture the dynamics of the molecules in the ligand-receptor network. However, exact simulation algorithms for stochastic reaction diffusion equations are extremely computationally expensive, and hence cannot be applied to larger networks. Moreover, in general, we do not know how the molecules are distributed -in practice we will only be able to measure total concentration changes.
To get around this problem, we use the proposed model based on the Langevin equation with spatial fluctuations, which gives the following stochastic equations:   The new Langevin equation with the optimal δ obtained by solving a minimisation problem provides a highly accurate simulation of the molecular time histories in less than one second (Noisy Langevin #1), compared with 38 minutes to compute one reaction-diffusion simulation using MesoRD. The inset shows the corresponding time history of δ, which is clearly able to capture the effects of spatial heterogeneity. Less accurate but still reasonable matched trajectories with even lower computational overheads are obtained using a random sampling of δ between upper and lower bounds that can be calculated from experimental measurements (Noisy Langevin #2).
where the independent parameters dw R , dw C and dw L have zero-mean Brownian motion with variance dt , and the dynamics of δ depend on the spatial distributions of the molecules involved in the chemical reactions. In order to obtain a time history of δ, assume that a web-lab experiment has been performed and multiple measurements have been obtained of the number of complexes, indicated by green circles in Fig. 4 (here, we use a set of sampled average values from the stochastic reaction-diffusion simulations as surrogate data points). An optimal δ for the time period (t k−1 , t k ] can then be calculated by solving the following minimisation problem: where k is an integer between 1 and N, N is the total number of measurements, t 0 is the initial time,  C k is the complex number measurement at t k , C k is the mean value of the complex number at t k provided by solving the noisy Langevin Equation (16), and dt is set to 1 s. The minimisation problem is a one-dimensional search, which can be efficiently solved by golden-section search or parabolic interpolation, as implemented in fminbnd in the MATLAB optimisation toolbox 15 . The results are shown in Fig. 4. As expected from the stochastic spatial-diffusion simulation setting, the spatial heterogeneity is larger at the beginning and slowly diminishes as the molecules interact and are diffused. The Langevin equations are integrated using the 1st-order Euler method. The comparisons with the trajectories from the solutions of the full 3-dimensional reaction diffusion equations are shown in Fig. 4, where both the number of ligands and the number of complexes (Noisy Langevin #1 in the figure) are well matched with the spatial-diffusion simulation. As shown in the figure, the proposed modelling approach provides a much better approximation of the true network dynamics than the ordinary differential equation based model, and in fact closely approximates the exact solution provided by solving the reaction diffusion equations.
Crucially, however, solving the noisy Langevin equations requires only a fraction of the computing time -less than a second to solve one complete time history for this example on a typical desktop computer. On the other hand, it takes about 38 minutes for MesoRD to finish one simulation 16 . On the same computer, for the uniform mixed condition, MCell, another Monte-Carlo stochastic reaction-diffusion simulator, takes about 24 minutes to obtain one realisation of the stochastic simulation 17 .
Note that a simpler approach of estimating δ instead of solving the optimisation problem would also be possible, and could potentially be useful for application of the approach to large-scale networks. Here, we assume that δ is a uniform distribution over u u [ , ] which changes every t N sampling time, and u and u are estimated using the total concentration changes from the experimental measurements. The number of complex molecules is then calculated for various values of u and u to fit the measurements. and t N is set to 10 s. t N is set to a value greater than dt as spatial fluctuations would not be faster than the chemical reactions. For our example, the estimated bounds for the best fit are = .
u 0 3 and = . u 0 7. As expected, the resulting simulations are less accurate than those generated by the optimisation based approach, but reasonably well matched trajectories (Noisy Langevin #2 in Fig. 4) are still obtained.

Conclusions
A novel modelling methodology for simultaneously capturing stochastic and spatial effects on the dynamics of biomolecular networks was presented. We showed how, using a relatively simple formalism, i.e. the Langevin equation with noisy kinetic constants, spatial heterogeneity in the interaction space of the network can be included in the model. The effectiveness of the modelling approach is shown through its application to the modelling of the epidermal growth factor receptor network of a human mammary epithelial cell. Using the proposed method, highly accurate simulations of stochastic spatio-temporal dynamics were produced with massively reduced computational overheads. As spatial heterogeneity produces noticeably different dynamics in the network, it is vital that future modelling formalisms are able to take this heterogeneity into account, while minimising the computational complexity of the resulting simulations. The method presented in this paper represents a significant step forward in achieving this goal.

Methods
Stochastic Simulations. For the well-mixed cases, the Gillespie's direct method is implemented and the simulations are performed using MATLAB 18 . The reaction-diffusion equations are solved using MesoRD Ver. 1.1 11 . The uniform mixed condition was also solved using MCell for the comparison of the computation time 17 . The noisy Langevin equation is implemented in MATLAB and solved using the 1st-order Euler method with the integration interval equal to 1 seconds.