Introduction

The intracellular environment is a densely concentrated region packed with a mixture of numerous types of macromolecules and subcellular structures1. Because of this dense crowding, biochemical reactions in vivo behave significantly differently from the same reactions in the well-mixed and dilute conditions of typical in vitro models1. Molecular crowding, one of critical features of the intracellular environment, affects many fundamental cellular processes, such as protein folding2, protein aggregation3, enzyme activity4, reaction kinetics5 and signal transduction6. Moreover, the relative size and shape of crowding agents are also crucial parameters in molecular crowding7,8.The specific effects of molecular crowding on any particular system, however, are not easily predicted even given detailed knowledge of the specific physical parameters of the given reaction system9,10,11. Therefore, quantitatively characterizing how crowding influences any given reaction system is a very challenging problem using in vivo or in vitro model systems. Computational modeling and simulation methods provide alternative ways to estimate the effects of individual parameters singly or in combination because these methods allow for us to easily and precisely vary reaction system parameters separately and in combination, a capability difficult to match in any real experimental system. Prior computational models, however, have significant limitations with respect to simulating model reaction systems under crowded conditions. Ordinary differential equation (ODE) models generally assume a well-mixed and diluted solution and thus ignore the crowding effect. Partial differential equation (PDE) models can include spatial constraints but also provide no explicit basis for modeling the crowding effect specifically12. Particle-based simulations provide a way to more directly address the molecular crowding effect and explore how many parameter changes might influence it. Lattice-based models provide a computationally efficient means of performing such simulations but require simplified models of particle movement and structures and are therefore prone to exaggerating the crowding effect13. Off-lattice models based on Brownian or Langevin dynamics provide for more realistic simulations but at a high computational cost for highly crowded conditions14,15, making them unsuitable for modeling crowding for large numbers of particles or long time scales. There is consequently no method that has both the realism to accurately and quantitatively simulate reactions in crowded media and the efficiency to do so for sufficiently large system sizes and time scales to model many biological reactions.

In order to overcome the inherent limitations of the various methods available, we propose a novel strategy intended to yield both efficiency and accuracy in modeling chemistry in crowded media. We accomplish this goal by using relatively computationally costly particle simulations on simple test systems to determine how multiple parameters act singly and in combination to influence the crowding effect and then fit an easily-applied regression model to outputs of these simulations for use in quickly making crowding corrections to more efficient simulation methods. In previous work, developed a coarse-grained two-dimensional stochastic off-lattice particle simulation (2DSOLM)16 based on Green's function reaction dynamics17 for simulating binding kinetics in various molecular crowding conditions. We subsequently showed that one can simplify the problem of modeling of the crowding effect by identifying a subset of “separable” parameters, whose effects one could learn independently and then merge into a model of their collective effects18. These four parameters in 2DSOLM are the total concentration (C); the probability of binding upon a collision between two reactant monomers (B); the mean time for dissociation event (M), defined as the inverse of the rate constant; and the diffusion coefficient for reactants and inert particles (D). The work showed that such an approach can simplify the problem of quantitatively modeling reactions in crowded media by allowing one to independently account for the influence of several separable parameters. It did not, however, resolve the inherent problem that the crowding effect depends on non-linear interactions of multiple physical parameters.

Here, we generalize the prior approach to account for both separable and inseparable parameters by creating a collective model accounting for the prior parameters and two additional parameters – the area ratio of dimer to monomer (α, ) and the area ratio of inert crowding agent to monomer (β, ) – whose effects are inseparable from one another and from that of the total concentration (C). Figure 1 illustrates these two cross-dependent interaction parameters. The parameter α captures the change in excluded volume induced by binding between two reactant monomers. Fig. 1a illustrates three different scenarios after the reaction: α<2, α = 2 and α>2. The first case represents a decrease in the area occupancy of two reactants, as might occur if one molecule docks into a solvent-inaccessible binding pocket of another molecule or the two bind tightly along two rough faces whose solvent-accessible areas were significantly larger than their van der Waals areas. The second case represents an unchanged area occupancy of the product relative to the two separate reactants. The third case represents an increase in area occupancy of the product binding, as may occur if the proteins bind so as to establish some new solvent-inaccessible cavities. This parameter would be expected to influence crowding due to an entropic preference for reducing excluded volume in conditions of high crowding. We would thus expect crowding to tend to drive binding for α<2 and inhibit binding for α>2.

Figure 1
figure 1

Illustrations of the 2DSOLM reaction model for two different cross-dependent interaction parameters.

The parameters α and β correspond to sizes of reactant dimer and crowding agent relative to reactant monomer. Cyan circles are reactant monomers, magenta circles are reactant dimers and open circles are inert crowding agents. (a) Illustration of varying α. (b) Illustration of varying β.

The parameter β represents another cross-dependent property describing the interaction between reactants and inert particles and specifically influencing how effectively inert particles provide steric hindrance to reactants. Three different cases are illustrated in Fig. 1b : β<1, β = 1 and β>1 for 2DSOLM. The first case corresponds to individual inert particles smaller than reactant monomers, the second to inert and reactant particles of equal size and the third to inert particles larger than reactant monomers. Unfolding experiments on ubiquitin7 have shown that reducing the size of an inert crowding agent can lead to a stronger crowding effect. However, such anecdotal observation provides little basis for quantitatively estimating how the relative size of crowding agents to reactants will affect any given the binding reaction system, especially in a background of potentially multiple other parameter changes. For example, while scaled particle theory in principle allows one to make such inferences for some isolated parameter changes, it is difficult to generalize to more complicated scenarios. In the present work, we develop a unified model accounting for both the previously separable and the inseparable parameters of the simulation, which we refer to as the unified regression model. We show that it is possible to accommodate the three inseparable parameters (α,β,C) in a single model through a multidimensional polynomial regression model and validate this model over variations in pairs of parameters. In addition, we validate our model of these three inseparable parameters by comparison with estimates from scaled particle theories19,20. We then show that it is possible to produce a full unified regression model combining these multidimensional regression models with the previously derived separable components. Finally, we show that the unified model provides an accurate description of random changes in all of the simulation model's physical parameters over a broad range of biologically relevant values in all parameters. In the process, we show a way to build a model with nearly equivalent predictive power to the costly particle simulations, but capable of easy evaluation for use in correcting computationally efficient simulations of large systems and long time scales. These findings have ramifications for our ability to computational model a diversity of cellular functions that are likely to be heavily influenced by intracellular crowding in two dimensional space, such as protein synthesis, signal transduction, or cytoskeleton assembly and disassembly.

Results

Simulation in 2DSOLM

We investigated the parameter dependence of binding reactions in crowded media by using a two-dimensional model of dimerization of a reactant monomer of radius 2.5nm simulated in a 100nm × 100nm space with a hard reflective boundary condition. This simple reaction model allowed us to focus on the specific issue of binding equilibrium in a setting tractable for exploring a broad parameter space with sufficiently large particle numbers and numbers of replicates to produce reproducible results. We began with our prior regression model expressing equilibrium constant as a multiplicative function of the effects of the four separable parameters: C (total concentration), B (binding probability), M (inverse dissociation rate) and D (diffusion coefficient)18. We then added the two cross-dependent physical parameters, α and β, which produce crowding effects inseparable from one another and from C. To extend the prior regression model to the parameters α and β, we first established a baseline simulation parameter set with default parameter values of B = 0.7, M = 1ns, D = 6.95×10−11m2s−1, α = 2 and β = 1, values chosen based on our prior simulation studies16,18 to produce a reasonably strong crowding effect as well as to approximate the temperature and viscosity conditions of the cytoplasm21,22. We fixed one additional simulation parameter, dth, describing a threshold maximum distance at which two particles can interact with one another, to be dth = 0.5nm (one fifth of the radius of a reactant monomer). In SOLM simulations, all particles are diffusible and are separated by at least a distance of dth . When the simulation places two reactant monomers within a distance of dth of one another, they must either undergo a binding reaction or a collision event, with probability B of binding or 1- B of collision16. One reactant and one inert particle or two inert particles that are placed within dth of one another always undergo a collision event16. A binding event will result in the formation of a new dimer, while a collision will result in the particle positions being resampled to be beyond dth . The choice of dth influences the physical model of the simulator, in part because dth is effectively a radius at which particles are assumed close enough to exert non-bonded forces on one another and initiate a possible binding interaction and in part because it imposes an upper limit on the maximum possible crowding level in the simulation. It also influences computational efficiency, as small dth can lead to larger numbers of events when particles are in close proximity. The default dth value was chosen empirically to permit physiologically reasonable levels of crowding cases (up to 0.45 concentration) while still yielding efficient simulations.

We then separately varied the individual parameter values for α and β in conjunction with variations in total concentration using the default values for the other parameters to test these cross-dependent parameter effects on the binding chemistry. We simulated ten α values (1.4, 1.6, 1.8, 2.0, 2.2, 2.4, 2.6, 2.8, 3.0, 3.2) and seven β values (0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6). For different α value simulations, we simulated eight total concentrations (0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45) with dimensionless units of fraction of total simulation area occupied by particles. To produce varying concentrations, we began with a fixed concentration of reactants of 0.1 and then added inert crowding particles to yield each higher value of total concentration (e.g., C = 0.1 corresponds to purely reactant monomers while C = 0.25 corresponds to concentration 0.1 of reactant monomers and 0.15 of inert crowding agents). For simulations of varying β, we simulated seven total concentrations (0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4) again using the fixed reactant concentration of 0.1 and varying inert monomer concentrations, similar to the α simulations. Each set of parameters was run for 25µs with 30 repetitions, with progress recorded every 0.15625µs. SOLM is a discrete event-driven method and thus has no fixed time step, but rather jumps between discrete changes in system states with randomly sampled waiting times between events16. The time interval to collect data on simulation progress only affects visualization of results, not the actual progress of the simulation and was chosen to provide sufficient resolution to clearly display transient behavior early in the curve and stochastic fluctuations at long time scales. Data was collected at five different simulation times (5, 10, 15, 20, 25µs) to analyze the crowding effect on the test reaction, with values selected based on our prior work18. For each condition, we measured reaction progress by the mean number of dimers as a function of time across all simulations.

Figure 2 shows the simulation results for varying values of parameters α and β. Fig. 2a shows mean dimer counts at quasi-equilibrium for varying α values. As the parameter value α decreases, the number of dimers at equilibrium rapidly increases, especially for the high crowding cases. Interestingly, there is only a slight difference between dimer counts at very low crowding levels but the separation between the equilibrium counts rapidly increases with increasing crowding level. The Keq curve, calculated from simulation data, more clearly shows how strongly the parameter α influences the model reaction system, as shown in Fig. 2b. These results show that the parameter α and the parameter C are cross-dependent and inseparable. While the size of a dimer does not significantly influence the binding reaction under low crowding conditions, a smaller dimer is much more conducive to binding than a larger dimer under densely crowded conditions. The embedded images in Fig. 2a and 2b show snapshots of the simulator at the quasi-equilibrium state (25µs) for α = 1.4 and α = 3.2 respectively at under a highly crowded condition (C = 0.45) in 2DSOLM. Fig. 2c shows mean dimer counts at the quasi-equilibrium state based on simulation data for varying β. As the parameter value β decreases, the number of dimers at the equilibrium state rapidly increases, especially for high crowding cases. Similar to the parameter α case, variations in β produce little change at low crowding levels, but the difference rapidly increases as the crowding level increases. The Keq curve, calculated from simulation data, shows how strongly the parameter β influences the model reaction system, as shown in Fig. 2d. These results show that the parameter β and the parameter C are cross-dependent and inseparable. While the size of an inert crowding agent does not significantly influence the binding reaction under low crowding conditions, binding is much more favorable in the presence of a smaller versus a larger crowding agent under densely crowded conditions. The embedded images in Fig. 2c and 2d show snapshots of the simulator at the quasi-equilibrium state (25µs) for β = 0.4 and β = 1.6 respectively at a highly crowded condition (C = 0.4) in 2DSOLM.

Figure 2
figure 2

Binding equilibrium as a function of concentration for varying α and β.

Cyan circles represent reactant monomers, magenta circles represent reactant dimers and open circles represent inert crowding particles. Embedded screen snapshots show an illustrative quasi-equilibrium state for the maximum feasible level of crowding for each curve set. (a) Dimer counts for α = 1.4(top), 1.6, 1.8, 2.0, 2.2, 2.4, 2.6, 2.8, 3.0, 3.2(bottom). Inset image shows a snapshot for C = 0.45 and α = 1.4. (b) Equilibrium constants for α = 1.4(top), 1.6, 1.8, 2.0, 2.2, 2.4, 2.6, 2.8, 3.0, 3.2(bottom). Inset image shows a snapshot for C = 0.45 and α = 3.2. (c) Dimer counts for β = 0.4(red, top), 0.6(green), 0.8(blue), 1.0(cyan), 1.2(magenta), 1.4(black), 1.6(gray, bottom). Inset image shows a snapshot for C = 0.4 and β = 0.4. (d) Equilibrium constants for β = 0.4(red, top), 0.6(green), 0.8(blue), 1.0(cyan), 1.2(magenta), 1.4(black), 1.6(gray, bottom). Inset image shows a snapshot for C = 0.4 and β = 1.6.

Unified regression models

We developed three different polynomial regression models: Keq (C,α) with default β value ( = 1), Keq (C,β) with default α value ( = 2) and Keq (C,α,β). To find the best-fit degree for each model, we applied leave-one-out cross validation to models of varying degree derived from the simulation data. For Keq (C,α), a total of 80 different simulation data points was collected, making 11th degree the highest possible before the number of regression coefficients exceeds the number of data points. For Keq (C,β), a total of 49 different simulation data points was collected, making 8th degree the highest possible. For Keq (C,α,β), a total of 122 different simulation data points was collected, making 7th degree the maximum possible. Figure 3 shows the root mean square errors of the leave-one-out cross validation for Keq (C,α), Keq (C,β) and Keq (C,α,β), respectively. Sixth degree polynomials were selected as the best-fit models after cross validation. The resulting best-fit regression models for Keq (C,α), Keq (C,β) and Keq (C,α,β) are provided in equations (13).

Figure 3
figure 3

Least-squares error rates for leave-one-out cross validation test for different polynomial degree.

(a) Keq (α,C). (b) Keq (β,C). (c) Keq (α,β,C).

Surface plots of Keq values as functions of both parameters α and C are shown in Fig. S1 and S2 for simulation data and for the best-fit polynomial regression models of degrees one through eleven. Similarly, the surface plots of Keq values as functions of both parameters β and C are shown in Fig. S3 for simulation data and for the best-fit polynomial regression models of degrees one through eight. Finally, the unified regression models were built, which combined with previous regression model18 , shown in Methods.

Evaluation of the unified regression models

To evaluate three different unified regression models in equations (57 in Methods below) with the best-fit sixth degree polynomials (1–3), we randomly selected 30 test cases for each model with fixed reactant concentration 0.1 and other parameters selected uniformly at random from the ranges C = 0.1 – 0.4, B = 0.1 – 0.9, M = 0.6 – 1.4ns, D = 1.95 – 21.95 × 10−11m2s−1, α = 1.4 – 3.2 and β = 0.4 – 1.6. For each test case, we compared mean dimer counts from the simulation to the estimated dimer counts predicted by the unified regression models. Simulation values were averages of over 30 repetitions per data point. As an additional test of the performance of the model, we conducted another 30 random trials for each model for a 200nm × 200nm boundary and a 400nm × 400nm boundary in order to explore whether boundary effects significantly impact the fit of the regression model. We similarly compared these simulation results with estimates from the regression model. Simulation values were averages of over 5 repetitions per data point. Figure 4 shows the comparison between the simulated values and estimated values from our regression models using random parameter values from the full parameter space of each regression model. For the 100nm × 100nm cases in Fig. 4a, d and g, our regression model closely matches the dimer counts derived from the particle simulations at the quasi-equilibrium state. The results suggest that one can use the regression model as a reliable and much faster replacement for the particle simulations for quantitatively predicting equilibrium constants over the parameter range examined. For the 200nm × 200nm cases in Fig. 4b, e and h and the 400nm × 400nm cases in Fig. 4c, f and i, simulations and regression predictions still match very well, although the total particle number increases by four and sixteen times respectively, suggesting that boundary effects have minimal influence even for these relatively small simulations. The five-parameter regression models (Keq (C,B,M,D,α) with β = 1 in Fig. 4a, b and c, Keq (C,B,M,D,β) with α = 2 in Fig. 4d, e and f) accurately predicted the simulation dimer counts for simultaneous variation across the range of parameters previously examined singly or in pairs. The correspondence between the regression models and full simulations was somewhat worse for the full six parameter regression model (Keq (C,B,M,D,α,β) in Fig. 4g, h and i). Overall though, the correlation is strong for the vast majority of the cases, which supports the equivalency of the regression and simulation predictions over a broad range of biologically relevant values for all parameters.

Figure 4
figure 4

Comparison of dimer counts obtained from regression estimates vs . simulations.

A data point on the diagonal line would indicate perfect agreement between the two values while points above the line show overestimates of simulation values and points below the line underestimates. (a) Keq (α,C) in 100nm × 100nm. (b) Keq (α,C) in 200nm × 200nm. (c) Keq (α,C) in 400nm × 400nm. (d) Keq (β,C) in 100nm × 100nm. (e) Keq (β,C) in 200nm × 200nm. (f) Keq (β,C) in 400nm × 400nm. (g) Keq (α,β,C) in 100nm × 100nm. (h) Keq (α,β,C) in 200nm × 200nm. (i) Keq (α,β,C) in 400nm × 400nm. The concentration of reactants is fixed at 0.1 and the concentration of inert crowding agents is randomly selected from 0.0 to 0.3. The other parameters are uniformly randomly selected, with B from 0.1–0.9, M from 0.6–1.4ns, D from 1.95–21.95 × 10−11m2s−1, α from 1.4 – 3.2 and β from 0.4 – 1.6.

Comparison with scaled particle theory

The influence of non-ideal interactions on chemical reactions can be estimated by thermodynamic theories9,20. The excluded volume effect that becomes significant under conditions of high molecular crowding is one of critical non-ideal interactions. A correction factor to the equilibrium constant for excluded volume can be approximately calculated, yielding a corrected activity coefficient of reactants and products: Keq = Γ excKo , where Keq is an apparent equilibrium constant, Ko is the equilibrium constant in the ideal state and Γ exc is the correction factor for the exclusion effect from interactions among particles in the reaction system9,20.

To validate our model, we calculated the correction factor and apparent equilibrium constant for various parameter conditions for the unified regression model in Eq. (3), 2DSOLM simulation and scaled particle theory in two dimensional space (2DSPT)19,20. We first ran additional simulations at an uncrowded 1% concentration case (C = 0.01, pure reactants case) with default parameter values of B = 0.7, M = 1ns, D = 6.95×10−11m2s−1, α = 2, β = 1 and dth = 0.5nm. We assumed that the interaction among particles at 1% concentration can be reasonably ignored, allowing us to estimate an uncorrected equilibrium constant at the ideal state (Ko ) from the simulation results. We also ran simulations for three α values (1.4, 2.0, 2.6) and three M values (0.1, 1, 10ns). In addition, we ran 1% concentration simulations for three dth values (0.5, 0.25, 0.125nm). Each of these simulations was used to calculate a K0 for the appropriate condition. Tables 1 and 2 show the calculated Ko from these simulation results.

Table 1 Ko [counts−1m2] for two different parameters (M and α) from simulation results of 1% concentration, pure reactants. All the other parameter values are set to default values (B = 0.7, D = 6.95×10−11m2s−1, β = 1 and dth = 0.5nm). The data was collected at 5 time points (5, 10, 15, 20, 25µs) for 30 independent runs in a 500nm × 500nm simulation area.
Table 2 Ko [counts−1m2] for different dth parameters from simulation results of 1% concentration, pure reactants. All the other parameter values are set to default values (B = 0.7, M = 1ns, D = 6.95×10−11m2s−1, α = 2 and β = 1). The data was collected at 5 time points (5, 10, 15, 20, 25µs) for 30 independent runs in a 500nm × 500nm simulation area.

We then estimated apparent equilibrium constants as a function of total concentration for each set of parameters examined above. For tests of variation in α and M, we examined two different situtations of crowding induced by 0.1 concentration of reactants (CR ) plus inert crowding agents to yield the eight total initial concentrations: 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45. As a secondary validation of the model outside the training conditions, we examined crowding induced purely by reactant particles at the same eight initial concentrations: 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45. In both cases, apparent equilibrium constants for SOLM were calculated from simulations and the apparent equilibrium constants for unified regression model (URM) were estimated from Eq. (7). The apparent equilibrium constants from scaled particle theory (SPT) were calculated using calculated Ko from SOLM simulation of 1% pure reactant case and estimated correction factors based on the equations in 2DSPT for given parameters (concentration of reactants (CR ), concentration of inert crowding agents (CI ), α and β)19,20.

Figure 5 compares apparent equilibrium constants for crowding induced by adding inert crowding agents to fixed 0.1 CR and for pure reactants for varying M and α. URM, SOLM and SPT all show an apparent equilibrium constant that nonlinearly increases with more compact dimers and higher total concentrations. In addition, the apparent equilibrium constant for URM, SOLM and SPT increases as the mean time of dissociation events increases from 0.1ns to 10ns. The test case for the exclusion effect mainly from the inert crowding agent, shown in Fig. 5a,b,c, showed that URM more closely tracks the SOLM results than does SPT, consistent with the notion that the regression model is better able to mimic the more complicated interaction patterns found in the full simulations than are the more simplified assumptions of the SPT model. However, tests involving high crowding induced by the reactant particles themselves rather than an inert crowding agent, shown in Fig. 5d,e,f, showed that SPT predicts the apparent equilibrium constant more closely to SOLM simulation results than URM does, especially in cases involving more compact dimers. These latter results demonstrate that the ability of the regression model to accurately capture the behavior of the particle simulation can break down outside the parameter domain to which it was trained.

Figure 5
figure 5

Apparent equilibrium constants for SOLM, URM and SPT for two different parameters (M and α).

(a) M = 0.1ns. (b) M = 1ns. (c) M = 10ns for 0.1 concentration of reactants with additional inert crowding agents. (d) M = 0.1ns. (e) M = 1ns. (f) M = 10ns for pure reactant simulations. All other parameter values are set to default values (B = 0.7, D = 6.95×10−11m2s−1, β = 1 and dth = 0.5nm).

As a further demonstration of the utility of the regression approach, we added an extension to the model to account for an additional parameter, dth , a threshold distance used by the simulator for identifying possible reaction events among nearby particles, as well as by the Green's function reaction dynamics (GFRD) algorithm to trigger new sampling of particle positions16,17. This parameter can be considered to roughly capture the difference between an interaction radius at which two particles can come close enough to influence and a collision radius at which they would actually sterically interfere with one another. Figure 6 shows the apparent equilibrium constant for fixed 0.1 CR with added inert crowing agents for three different β values and three different dthvalues. The parameter effect of the threshold distance (dth ) is nonlinear and inseparable from other parameters that influence crowding, particularly total concentration (C). Because the URM was not trained to model dth , we added an additional multiplicative and separable term to the URM by performing a regression fit to a quadratic function in C and dth . The resulting term

provides the best quadratic fit for the given parameter range using a leave-one-out cross-validation. The figure plots predicted Keq for SOLM simulations, SPT corrections to low-concentration simulations and the URM extended with the above C and dth cross-term. There is no explicit SPT correction available for this parameter, although the SPT results do reflect the change in the low-concentration equilibrium K0 induced by changes in dth . All three models show similar non-linear behavior at the default dth of 0.5nm. At dth = 0.25 nm, the extended URM provides a noticeably closer fit to the full particle simulations than does SPT, showing that even a crude and essentially physically naive regression model can reasonably capture the behavior of the full simulations. At dth = 0.125nm, results are more mixed, with the extended URM showing a better fit for much of the range of β and C, although eventually breaking down at the most extreme conditions of high concentration and small crowding agents. The results thus demonstrate the ability of regression modeling to capture a physical effect for which we lack a good analytical model, although also the tendency of such regression models to break down as one approaches the limits of the parameter range to which they were trained.

Figure 6
figure 6

Apparent equilibrium constants for SOLM, URM and SPT for two different parameters (β and dth ).

(a) dth = 0.5nm. (b) dth = 0.25nm. (c) dth = 0.125nm for 0.1 concentration of reactants with additional inert crowding agents. All the other parameter values are set to default values (B = 0.7, M = 1ns, D = 6.95×10−11m2s−1 and α = 2).

Incorporation of URM corrections into ordinary differential equation (ODE) simulations

The major goal of this work is to develop fast but versatile corrections for crowding effects that can be incorporated into more complex simulation models. To that end, we conducted a simple demonstration of how the URM models can be used in building efficient crowding corrections into fast crowding-free simulations. We simulated a homodimerization reaction, comparing SOLM particle simulations of the reaction, standard mass-action ODE simulation without crowding corrections and a crowding-corrected ODE using our unified regression model (URM). Each SOLM particle simulation used default parameter values (B = 0.7, M = 1ns, D = 6.95×10−11m2s−1 and α = 2, β = 1, dth = 0.5nm) and was repeated 30 times with the average of the 30 runs used at each time point for visualization. We repeated each simulation using an ODE model of the system lacking correction for any crowding effects, deriving a dissociation rate koff from the mean dissociation time M and an association rate kon from koff and the estimated idealized equilibrium constant K0. We then repeated each simulation using an ODE with koff again derived from M but kon derived from M and the URM-predicted equilibrium constant Keq (C,B,M,D,α,β) in Eq. (7). More details of the calculations are provided in Methods. Two sets of simulations were conducted, one using 0.2 total concentration (0.1 CR + 0.1 CI ) and one using 0.3 total concentration (0.1 CR + 0.2 CI ). All simulations were run for 1 µs of simulation time. Figure 7 shows results for C = 0.2 (Fig. 7a) and C = 0.3 (Fig. 7b), highlighting 0.01 µs for C = 0.2 and 0.1 µs for 0.3, sufficient to display both transient and pseudo-equilibrium behavior in each case. While both ODE models substantially deviate from the full particle model in the early transient phase, the URM-corrected model adjusts to yield an equilibrium value consistent with the full SOLM particle model while the uncorrected ODE substantially understates the equilibrium produced by the full particle model. Table 3 shows run times for these sample simulations, as well as larger simulations quadrupling the simulation space and particle counts. As the table shows, both the crowding-naive and URM-corrected ODE models show comparable run times to one another, essentially independent of the scale of the problem. The SOLM particle simulations, however, produce substantially longer run times than those of either ODE model in both conditions and show approximately a 16-fold increase in run time as the system size is scaled four times. This quadratic increase in problem size for the particle methods is expected due to an approximately linear increase in work per event and linear decrease in mean waiting time between events with increasing problem size16.

Table 3 Run time comparison for SOLM particle simulations, ODE simulations using URM corrections for crowding and uncorrected ODE simulations. The table reports run times in seconds for 1µs simulations of dimerization reactions using default parameter values for each of the three models and for two system sizes and two crowding levels.
Figure 7
figure 7

Demonstration of URM as a method for adjusting for crowding effects in simulations.

Each plot shows a comparison of a full SOLM model, an ODE model ignoring crowding effects and an ODE model using the URM to adjust for crowding in the simulation. Each SOLM simulation uses default parameter values (B = 0.7, M = 1ns, D = 6.95×10−11m2s−1 and α = 2, β = 1, dth = 0.5nm). Each SOLM curve shows an average over 30 repetitions, with error bars omitted for clarity. ODEs are simulated using estimated rate parameters from the quasi-ideal equilibrium constant in table 1 and default parameter value of M. URM simulations are simulated using the default value of M to derive reverse rates and URM estimates of keq to determine forward rates for the given M. (a) total concentration 0.2 (0.1 CR + 0.1 CI ). (b) total concentration 0.3 (0.1 CR + 0.2 CI )

Discussion

We have demonstrated that it is possible to establish a unified regression model of binding equilibrium in crowded media that can in principle incorporate any physical parameters one can model in a full particle simulation, even when those parameters exhibit cross-dependencies for which we lack any predictive analytical theory. We have shown that this approach can accurately mimic a computationally costly particle model over a broad range of several such parameters. We have in the process shown the feasibility of a strategy for quantitative simulation of reaction chemistry in crowded media that will maintain comparable accuracy to detailed particle simulations while enabling the use of much faster simulation methods. Our model is consistent with previous theoretical studies20 in finding that total concentration (C) and size ratio parameters (α, β) act non-linearly to affect the binding equilibrium constant and further captures both the cross-dependency of three key parameters (C, α, β) on the model reaction system and the independent parameter effects of three other parameters (B, M, D). The correction factors and apparent equilibrium constants derived from our regression model generally match well with those derived from scaled particle theory, demonstrating that our model captures the crowding effect for various conditions of C, α, and β, reasonably. However, because the apparent equilibra and corrections of our regression model are based on empirical measurements from particle simulations, the approach will extend in a straightforward manner to a much broader range of parameters. We have further demonstrated how one can use this regression model as a fast correction for crowding effects in efficient ODE simulations. While the results of this corrected ODE model do not capture kinetics of the crowded system as well as the true particle simulation, they nonetheless provide an alternative that accurately captures the effects on system equilibrium of the full particle simulations while exhibiting run times comparable that of a simple ODE.

The principal alternatives for modeling chemistry in crowded conditions are the direct use of detailed particle models, similar to our SOLM but potentially far more involved in the range of parameters and details of the physical model and scaled particle theory, which uses simpler models for which one can analytically calculate the contribution of the excluded volume event to reaction equilibria. The regression approach is intended to balance the competing benefits of these two alternatives: particle models in allowing one to explore effects of many parameters with potentially complicated cross-dependencies and SPT in allowing one to quickly compute corrections for crowding. Particle models are computationally costly, making them unsuitable for extremely large systems, slow reactions, or very complicated models. SPT's major limitation is that it depends on one's ability to derive analytical models for the specific parameters or phenomena one wishes to consider. SPTs to date make various assumptions that one may wish to relax in practice and such corrections may prove difficult to analyze. For example, both SPT and SOLM conventionally assume that individual particles are hard spheres in 3D and hard circles in 2D. This assumption has been relaxed in various ways in the SPT literature. Rajagopalan's group extended from the hard-sphere interaction model of SPT to the nonbounded square-well interaction model23 and Mittal's group examined both repulsive interaction and attractive interaction of crowding particles24,25. In addition, Minton extended from a simple hard sphere model in SPT to two different particle models of a querying protein: Gaussian cloud and equivalent hard sphere to examine the excluded volume effect on more realistic conditions26. These are not trivial extensions, however and are not easily made by users of the technology who are not experts in the underlying theory. By contrast, the results above demonstrate how a comparable extension of the regression approach can be accomplished with no major conceptual changes to account for varying effects of an extra parameter, dth , that effectively controls a difference between the collision radius and interaction radius of reactant particles. While the essentially naive modeling that goes into such regression fitting has its own limitations and cannot replace good theory when that is available, it does provide a serviceable substitute for theory for systems too complex to allow for analytical solutions. Our regression approach thus provides a key step towards computationally practical simulations of complex reaction systems in crowded media and thus towards realistic models of intracellular biochemistry.

As our demonstration example of URM-corrected ODEs shows, the model exhibits essentially the same thermodynamic behavior as the full particle simulations. However, the corrected ODEs exhibit very different kinetics from the full particle simulations, more comparable to those of the ODE. We attribute this inaccuracy in transient behavior to the fact that the URM at present is fit only to thermodynamic data and thus leaves the rate constants underdetermined. The regression approach can in principle be extended to fit rate constants, not simply equilibrium constants, potentially yielding fast corrections to both kinetics and equilibrium but future work will be needed to explore the effectiveness of such regression corrections on kinetics.

Even for thermodynamics, the equilibrium model is not a perfect representation of the system and additional challenges may be created by extending it to more parameters or unexplored regions of the parameter space. The accuracy of full six parameter model is decreased relative to that of the five-parameter models, which suggests that more accurate accounting of cross-effects may be needed. This observation may derive in part from our decision to omit cross αβ terms in the regression equation to reduce the number of regression parameters to a feasible level. It has previously been established that concentration acts nonlinearly on binding chemistry9,14,20,23. We can further conclude that allowing for cross-dependent parameters does require a higher order model of C than the third-order model18 that proved sufficient when capturing only the effects of separable parameters. Further increasing model complexity may be required to handle a richer model incorporating physical factors neglected in the present model.

It is also important to note that although the motivation behind this work is to improve models of chemistry in the intracellular environment, the model itself is a highly simplified view of binding chemistry in crowded media. The true intracellular environment is far more complicated than our simple reaction model, containing various mixtures of different sizes of proteins and other molecules that are often irregularly shaped and involved in many forms of macromolecular complex or other complicated interaction patterns. Recent simulation work in the area has shown the value of far more complicated physical models, e.g., in the recent work of Kim and Yethiraj using Brownian dynamics to explore effects of crowding on ligand binding to membrane-bound receptors.27 While our work has not been extended to such detailed models, it nonetheless shows that it is possible to produce a simple unified mathematical model that can reliably reproduce the results of qualitatively similar computationally costly particle simulations for variations in several key parameters of such a system. Nonetheless, one would need to account for many other parameters to produce a quantitatively predictive model of a realistic intracellular medium or any particular reaction system within it. While no method, including ours, can yet approach that goal, the present work provides a new approach toward more complex simulations at any degree of abstraction. Another obvious extension of the model is to three dimensional systems. We chose to establish the regression approach on two dimensional systems primarily because it provides a simpler and more computationally tractable framework in which to validate and demonstrate the proposed regression strategy. Two dimensional crowding does nonetheless have important applications in modeling association of membrane bound proteins20, for example in helping to explain observed patterns of clustering in the syntaxin-1 system28. While there are no appreciable conceptual obstacles to adapting the present methods to three dimensions, it remains to be shown that accuracy of the model would be comparable to that in 2D and that the increased computational cost of running particle simulations for training the regression models would not be prohibitive.

In the future, we hope to expand on the general idea of fitting regression models to increasingly detailed particle simulations in order to develop more realistic and complicated reaction models suitable for producing quantitatively predictive simulations of reaction chemistry in vivo. A similar approach may prove valuable to a diversity of fields that similarly depend on computationally costly particle models, including various disciplines of biology, physics, chemistry and engineering.

Methods

2DSOLM

Our previously developed two dimensional stochastic off-lattice model (2DSOLM)16 was used to conduct all particle simulations described in the paper. A detailed description of the simulation program and its underlying algorithms is provided in our prior work16. To effectively simulate some high concentration cases, a modified initialization protocol was used, as described in our prior research18. Initially, all reactants are monomers. Initialization of particle positions is performed by establishing a grid of potential particle positions at the maximum possible packing density for whichever of reactant monomers and crowding agents occupies the larger total area and then randomly inserting particles into the corresponding grid positions. This protocol was developed because it makes it possible to initialize in highly crowded conditions where independent uniform placement of particles would usually result in overlapping particles, such as for high concentration cases. The simulation program of 2DSOLM was implemented by C++ programming language and run on a Linux Beowulf cluster. The collected data files were analyzed and plotted using Matlab (R2008a). Figures 2, 3, 4, 5, 6, 7, S1, S2, S3 were generated using Matlab. Simulation snapshots were plotted using Matlab and then simulation movies were made using Adobe Photoshop by concatenated these snapshots. The simulation code for 2DSOLM was released at the web site: http://www.cs.cmu.edu/russells/projects/crowding/crowding.html.

The specific simulations run and their parameters are as described in the Results.

Calculating thermodynamic equilibrium constants by SPT

In calculating the thermodynamic correction factor (Γ exc = Keq /Ko ), we first assumed that 1% pure reactant case can be reasonably considered as an ideal state at which interaction among particles in such a diluted condition will be negligible20. For example, the calculated correction factor of 1% pure reactant case using simulation data in the default parameter value is 1.02, which is reasonably closed to the ideal state (Γ exc = 1). The simulation area was set to 500nm × 500nm for this condition to yield a large enough number of reactants to get accurate measurements for the given parameter conditions at 1% pure reactants case. The correction factor for SPT (, where γM and γD are activity coefficients of reactant monomer and dimer, respectively) was calculated by using the simulation data at 1% concentration to determine K0 followed by SPT analytical approximation to the activity coefficient under crowded conditions, as described in the prior literature on scaled particle theory in two dimensional space19,20. The density of reactant monomers and dimers under any given condition was calculated by the following equations18:

where ρR = density of reactant monomers at initial state [particle counts/simulation area], = density of reactant monomers at quasi-equilibrium state in the ideal state and = density of reactant dimers at quasi-equilibrium state in the ideal state. K is then calculated by solving the equation where

where activity coefficients are

where ρ (density) = number of particles / simulation area, R = radius of a particle for each particle species: M (reactant monomer), D (reactant dimer), I (inert crowding particle)19,20. Finally, the apparent equilibrium constant is calculated by multiplying the correction factor to Ko , which is calculated from simulation results of 1% pure reactant cases for given parameter conditions.

Simulation movie files

We made two movie files to demonstrate the simulation process and show the effects of the two cross-dependent parameters. Each movie contains two different moving images, which are distinguished by different values for a single parameter while all other parameters use the common default values. Video S1 presents α = 1.4 vs. α = 2.6 for C = 0.45 and video S2 presents β = 0.4 vs. β = 1.6 for C = 0.4. The first half of each movie shows the system in the initial (pre-equilibration) state and the second half of the movie shows a quasi-equilibrium state. Movie files illustrating the simulation progress and the effects of each parameter individually were provided for the previous independent parameters18. High-resolution versions of the movies can be downloaded from: http://www.cs.cmu.edu/russells/projects/crowding/crowding.html.

Fitting Keq to simulation results

We calculated the equilibrium constant (Keq) of the homodimerization reaction by solving for Keq as a function of the initial monomer concentration and mean steady-state dimer concentration. We then developed a regression model for Keq as a function of the simulation parameters by extending our prior regression model of four parameters (C, B, M and D) up to six parameters (C, B, M, D, α and β). First, we used our prior model for the contributions of the parameters B, M and D, which contribute linearly and independently to the equilibrium constant18, as shown in equation (4):

We then built regression models of Keq (C,α), Keq (C,β) and Keq (C,α,β) to include the parameter sets (α), (β) and (α,β). Because the three parameters (C,α,β) are cross-dependent, they cannot each be separately incorporated into the regression model. Unified regression models for these three test cases are provided in equations (5-7):

To find the best-matched regression model for each of the three cases, we performed least-squares polynomial regression for each parameter set, beginning with first-order regression models and increasing the degree until it was not possible to increase further without the number of regression parameters exceeding the number of data points. The general forms for the three regression models of Keq (C,α), Keq (C,β) and Keq (C,α,β) are as follows:

Coefficients for the polynomials were calculated by the least-squares polynomial regression. For the polynomial Keq (C,α,β), we neglected the cross αβ terms to reduce the number of coefficients of the model, on the hypothesis that these terms will have a negligible effect beyond the effects for which and terms account. We selected the best-matched polynomial across degrees by minimizing the least squares error using a leave-one-out cross validation test for each of the three cases.

Extension of the URM to account for threshold distance (dth )

An extension to the URM to account for effects of dth was developed by adapting the same protocol from above to learn an additional multiplicative factor to the standard URM. The extension assumed that the previous full URM Keq (C,α,β) could be corrected to account for both the direct influence of dth and the cross-dependence of dth and C with a multiplicative term learned by polynomial regression on dth and C. Simulations for three values of dth (0.5, 0.25, 0.125 nm) and three values of β (0.4, 1.0, 1.6) were fit to a series of polynomial models of the form

Least-squares regression with leave-one-out cross-validation found a second order model to yield the lowest cross-validated error. These fits were then used to create an extended URM, , used to evaluate the ability of the regression approach to incorporate additional parameters for which no equivalent SPT theory is available.

Use of URM as a correction to ODE simulations

To demonstrate of the use of the URM as a means of quickly providing crowding corrections for fast space-free simulations, we conducted a set of demonstration simulations using the URM to adjust rates to a fast ODE model. We compared this URM-corrected ODE model to full particle simulations with SOLM and to crowding-naive ODEs. SOLM simulations were conducted using default parameter values (B = 0.7, M = 1 ns, D = 6.95×10−11m2s−1 and α = 2, β = 1, dth = 0.5 nm, 100 nm × 100 nm space) for 0.1 CR + 0.1 CI and 0.1 CR + 0.2 CI . Each simulation was repeated 30 times with the resulting values averaged at each time point for visualization. Comparable ODE models were instantiated by assuming the dissociation rate koff is the inverse of dissociation time M and that association time kon is related to the idealized equilibrium constant K0 by the formula

K0 for each simulation was estimated for the idealized no-crowding condition from 1% concentration SOLM simulations, as in table 1. URM-corrected ODEs were derived using an ODE model with dissociation rate koff again set to M−1 but kon recalculated at each step of the ODE using the URM from the formula:

The model is thus corrected so as to yield URM-derived equilibrium values. Simulations were run for 1 µs for each model and concentration. As an additional test of run time scaling of the model, we repeated each simulation under identical conditions but using a 200 nm × 200 nm simulation space, with particle counts scaled proportionately to maintain concentrations at C = 0.2 or C = 0.3. Run times were recorded for the ode45 call for both ODE and URM simulations and for the full execution of the SOLM model for particle simulations.

Both ODE models were simulated with Matlab 7.7.0, R2008b using the ode45 embedded Runge-Kutta integration method. SOLM was implemented using C++ and run in GNU bash version 3.2.9 (1), release (x86-64-redhat-linux-gnu). All run time tests were run on a single Intel (R) Core (TM) 2 Quad CPU Q9550 @ 2.83GHz, 4GB RAM workstation using Linux Fedora release 7 (Moonshine).