Introduction

Mathematical modeling is the method of choice to test hypotheses and make novel predictions concerning complex biological signaling networks1. Decades of biochemical experiments and recent advances in high-throughput technology provide us with an increasing amount of quantitative data on protein interactions. Once integrated in detailed mathematical models, mathematical models can be used to delineate and clarify regulatory mechanisms of protein signaling networks and to make novel predictions of unexpected properties of the system, which can be then tested experimentally2,3. The predictive power of these models depends on the accuracy of both the network and the parameter values. Experimental data is typically scarce and many kinetic parameter values and cellular protein concentrations are either unknown or only possible to measure in vitro. Cellular concentrations of signaling proteins vary substantially across different cellular conditions and estimated in vitro values may not always reflect the in vivo behavior4.

Model-based experimental design is used in many fields of science to optimize the planning of the necessary experiments5. Similar strategies have also been explored in biology6,7,8, but the design strategies are typically heavily dependent on initial parameter choices and given the experimental restrictions, there is often not much scope for further experimental design beyond what is possible by verbal reasoning. Often data is therefore acquired before modelling and parameter values are then estimated using standard parameter identification procedures9,10,11,12. Local parameter optimization procedures typically yield one optimal parameter set together with confidence intervals. Global optimization procedures can yield a larger set of distinct parameter sets that all reproduce the data similarly well13,14. Sensitivity analysis is often carried out to analyse how parameter perturbations around the global optimum/optima affect model predictions7,15,16.

The experimental observations usually do not suffice to unambiguously determine all parameter values and many parameter values remain poorly constrained even when detailed quantitative data is available because the measured model output is insensitive to these parameter values for the applied perturbations17. This issue has been recognized as “parameter sloppiness”18. If model predictions are based on one optimal parameter set then a parameter value must also be fixed for the sloppy parameters. If the model prediction of interest depended on one such sloppy parameter then model predictions may be severely hampered. One way to avoid such shortcomings is to use an ensemble approach. Ensemble approaches can be used to explore the type of model predictions that can be made for the sets of possible parameter values19,20. Clustering can then be used to classify the different predictions.

We have previously studied a detailed model for the regulatory network that controls cell differentiation during sporulation in Bacillus subtilis21,22. The model was based on detailed, quantitative measurements and the parameter values were adjusted such that all data was reproduced very well. The model was subsequently used to explore the physiological situation. The model was sufficiently powerful to predict that the in vivo rate of the key phosphatase SpoIIE had to be 100–150 times lower than what had previously been measured in in vitro. This model prediction was confirmed in experiments, thus demonstrating the predictive power of the model. The model further revealed the allosteric nature of the interaction between SpoIIAA and SpoIIAB, which could also be confirmed in experiments. The model was subsequently used to define the mechanism by which activation of the transcription factor σF is achieved in the smaller prespore, but avoided in the larger mother cell upon asymmetric cell division during sporulation. The model predicted that this would be the result of the difference in cell size. Thus the phosphatase SpoIIE is a membrane protein and concentrates on both sides of the septum, while both its substrate SpoIIAA and the kinase, SpoIIAB, are cytoplasmic proteins. Given the smaller size of the pre-spore, the activity of SpoIIE is higher in this compartment, thus triggering the release of σF from an inhibitory complex with SpoIIAB by dephosphorylating SpoIIAA; unphosphoryalted SpoIIAA binds to SpoIIAB and displaces bound σF. The model further showed that the small difference in SpoIIE activity is sufficient to result in differential cell fate because of the cooperative binding of SpoIIAB and SpoIIAA and because of the low turn-over rate of the phosphatase21,22. As a result of the low turn-over rate of SpoIIE, most SpoIIE is bound to phosphorylated SpoIIAA and accumulation of SpoIIE on the septum therefore increases the concentration of SpoIIAA in the smaller prespore. The model also suggested that an important driving force behind the organization of the genes spoIIAA, spoIIAB and sigmaF on the spoIIA operon may have been protection from molecular noise23. Addition of physiological noise levels to the expression of these genes in the model lead to a reduction of the sporulation efficiency to 40% if spoIIAB was expresssed separately and to 60% if spoIIA was expressed separately. These predictions matched the observed sporulation efficiencies that were obtained when either spoIIAB or spoIIA were expressed separately, but under the same promotor.

The above predictions were all made with a single parameter set. While experiments confirmed the predictive power of the model, we wondered whether similar predictions would have been obtained for all physiological parameter sets that allow us to reproduce the experimental data. We therefore conducted a global parameter screen within the physiological parameter ranges and restricted these with the quantitative data using evolutionary sampling24. Although exhaustive sampling of the physiological parameter space is impossible, qualitative model behaviours typically cluster within the parameter space, thus permitting an ensemble approach19,20,25,26,27. We subsequently tested in how far the parameter sets that allowed us to reproduce the measured in vitro data would allow use to reproduce also the in vivo behaviour. Initially we focused only on the published kinetic data. We found that the parameter sets, which allowed us to reproduce all measured time courses, still failed to reproduce the in vivo behavior of the network. Comparison of the successful and unsuccessful simulations highlighted the critical parameters. Upon constraining these by further published data, all simulations predicted the physiological data correctly, even though many parameter values remained poorly restricted. We conclude that evolutionary sampling can provide a broader view on the qualitative behaviour of complex physiological networks and it confirms that for the available data, the model exhibits a high stability of its predictions, once the key parameter values are constrained.

Results

In a first step, we sought to formulate a comprehensive ODE model for the regulatory network. The previously published model had been hand-written and followed the interactions of 4 components (σF, AB, AA, IIE) and their modified forms and complexes to describe the in vitro experiments21. To describe the in vivo situation 2 further components (σA and RNA polymerase) had been added in the original model. Given the combinatorical complexity (compare Fig. 1A and Fig. S1), the original model comprised 50 ODEs to describe 150 different reactions between these few components, even though a number of less relevant reactions and states were ignored. Rule-based models facilitate the formulation of differential equation based models when multiple interactions among the components gives rise to combinatorial complexity and thus allows the efficient formulation of comprehensive models. We therefore formulated a rule-based model as graphically summarized in Fig. 1B28. The rule-based model was formulated to be applicable to both the in vitro and in vivo situations and it also considered nucleotides as a variable species. Accordingly, the rule-based model included seven molecule types. The network was described by 12 main rules (as specified in the Methods section). In brief, the activity of the Master transcription factor σF is controlled by three proteins: SpoIIAB (AB), SpoIIAA (AA) and SpoIIE (IIE). AB binds σF and keeps it in an inactive state. Binding of AA to the complex triggers the release of σF which is then free to bind the core RNA polymerase, thereby forming the active holoenzyme, which can be directed to the transcriptional sides. AB acts as a kinase and phosphorylates bound AA and IIE is the corresponding phosphatase de-phosphorylating AA. As part of the phosphorylation reaction, one ATP is converted into ADP at the nucleotide-binding site. We showed previously that the AB dimer binds AA cooperatively, thus enhancing the sensitivity of the mechanism21. Another transcription factor, σA, is competing with σF for RNA polymerase binding. The rules were translated into a set of 59 state variables and 190 individual reactions using BioNetGenerator29. All the reactions were assumed to take place in the cytoplasmic compartment. The system was solved using the numerical integrator ode15s in MATLAB.

Figure 1
figure 1

Graphical representation of the model.

(A) (Left) Cartoon, illustrating the asymmetric division of bacteria as response to starvation and the formation of a septum between the larger mother cell from the pre-spore. (Right) Cartoon of the biochemical interactions regulating σF during sporulation in B. Subtilis. The activity of σF is controlled by the proteins SpoIIAB (AB), SpoIIAA (AA) and SpoIIE (IIE). AB binds σF and keeps it in an inactive state. Binding of AA to the complex triggers the release of σF which is then free to bind the RNA polymerase. This figure has been reproduced from a previous publication21. (B) Contact map according to the rule-based formulation. The molecules involved in the regulation are indicated with smoothened rectangular shapes. As shown in the legend, interactions between molecules are indicated with line segments connecting their corresponding binding sites, which are indicated with circles. Competitive binding sites are half black. Rectangular shapes on the molecules indicate conformational changes, including closed/open, High/Low affinity and nucleotide types. The circles that are both black and white indicate competitive binding sites. The rules in the Methods part specify these interactions.

Parameter screens to identify parameter ensembles

Experimental data exist for both the in vitro and in vivo situation. The in vitro experiments characterized the interactions between AB, AA, IIE and σF in the absence of σA and core RNA polymerase (Fig. 2A, first column). The first panel of Figure 2B shows the key read-out of the physiological response, the concentration of the σF-bound RNA polymerase holoenzyme. To support sporulation, the concentration of σF-bound RNA polymerase holoenzyme must be negligible before septation and must subsequently reach micromolar concentrations within 15 minutes or less, as illustrated in the first panel of Figure 2B.

Figure 2
figure 2

Comparison of Model Predictions and Experimental Data.

(A) Six panels of time-resolved in vitro experimental data, measuring the fraction of σF-bound AB (a,b,d–f) and the ratio of phosphorylated AA per AB dimer (c). Different colors within the same panel indicate different experimental conditions as described in detail in21; for details see the method section. 106 simulations of these in vitro experiments were performed using the initially defined parameter ranges. 100 randomly selected model outputs and 100 best ranked according to their sum squared residual errors are illustrated. (B) Behavior of the in vivo response, according to21. Using the same parameters of the in vitro above, we simulate the in vivo conditions, observing the formation of the σF-RNApolymerase holoenzyme during the sporulation process.

To run the simulations we had to define initial ranges for the parameter values. Plausible ranges for the different parameter values are known from the literature. In previous similar sampling strategies, the parameter ranges were further adjusted by centering them around measured values from the literature24. We followed a similar strategy and centered all parameter values around the previously estimated parameter values (Table 1). We then sampled from a range that extended to ten-fold higher and lower values for protein concentrations and to 100-fold higher and lower values for reaction rates. We sampled 106 parameter sets that were drawn at random from a log-uniform distribution. New parameter ranges for subsequent screens were defined based on the subset of the parameters corresponding to the best 103 fits according to a least squared residual ranking (Fig. 3). After multiple cycles of this process, the model exhibited stable model predictions for the in vitro behavior. The results of this iterative process in terms of the model behavior are illustrated in Fig. 2A.

Table 1 Model parameters. Parameter descriptions, names, basal values and units and initially sampled ranges. The model parameters comprise kinetic rate constants (1–36) and protein concentrations (37–44). In the in vitro case 31 parameters were sampled (underlined). The initial ranges of the kinetic parameters span four and the initial concentrations two orders of magnitude. The parameters in bold were only used in the in vivo case. Parameters, which were not sampled, either have fixed values, or they are scaled relative to other sampled parameters
Figure 3
figure 3

Flowchart of the evolutionary sampling.

The initial parameter ranges were based on information available in the literature (Table 1). The ODE system was subsequently solved for multiple parameter sets sampled from these ranges. The sum of the squared residuals between the model predictions and the data was used to rank the parameter sets. Based on the best-n fits the next parameter ranges were calculated. If the parameter ranges were updated the system was again simulated for the new parameter ranges. This process was followed until the parameter ranges could no longer be further updated.

Time resolved data of six different experimental settings monitor the fraction of σF-bound AB (2A,a,b,d–f) and the ratio of phosphorylated AA per AB dimer (2A,c) under different conditions, as described in detail previously21. Each experiment focuses on characterizing different aspects of the system: association of nucleotides with AB (2A,a,c,d), binding of AA with AB (2A,b,d,f), phosphorylation of AA (2A,b) and the effect of the phosphatase IIE in the dissociation of the σF-AB complex (2A,e,f). Considering the normalized contribution of each of these six experiments to the overall sum of residuals of the system, we can rank them according to their fitness. However, the sum of the square residuals, as a measure of fitness, does not suffice to capture behaviors, which would be considered important, such as the steepness of the response curves in 2A,d. Therefore, in addition to the squared residuals, the steepness of the response was also taken into account for this experiment. As expected, already in the initial parameter sets, there is a visible difference in the quality of 100 random fits compared to the 100 best fits. Nonetheless, it is only after several screens that the 100 best parameter sets allow a close fit of the model to the experimental data (Fig. 2A).

The sequential restriction of parameter ranges during the iterations of the screen is shown in Fig. 4 and is summarized in the last column of Table 2. During this evolution of parameter ranges only few parameter ranges become strongly restricted (Fig. 4, Table 2). Among these are the initial concentrations of both AB and σF. The initial concentrations of the other components directly depend on those of AB and σF and are therefore equally restricted. The protein concentrations were adjusted carefully in the in vitro experiments and while absolute concentration measurements are always difficult, the relative concentrations could be adjusted well and we therefore fixed the relative concentrations also in the model. The fact that the screens greatly restricted the initial concentrations of both AB and σF confirms the high sensitivity of the network to the concentrations of the players.

Table 2 Analysis of failure to reproduce the in vivo behavior. The sampled parameter sets of the last screen were divided into two groups, based on their success or failure to reproduce the in vivo response. The equality of the distributions of each parameter in the two groups was tested using the non parametric Kolmogorov-Smirnov two-sample test. Similarly, the equality of the means of these distributions was tested with a two sample T-test. The p-values of these tests are listed. Comparing the initial and the final ranges, the last column indicates the percentage of constraint with respect to the initial parameter ranges
Figure 4
figure 4

The evolution of parameter ranges.

All parameters, which became constrained during the screening, are listed. The reference value is indicated by green dots, indicating the centers of the initially sampled parameter ranges. These ranges extended to ten-fold higher and lower values for protein concentrations and to 100-fold higher and lower values for reaction rates. The sequential restriction of parameter ranges during the screening is illustrated with gray shadow. Red dots indicate the central values of the final parameter ranges. The parameters k_IIEon and k_dephos (bold underlined) are two of the parameters, which needed to be further restricted to reproduce the in vivo behavior (Fig. 5, Table 2).

Among the kinetic rates, the ranges of the phosphorylation rate of AA, the rate of switching from ADP-bound to ATP-bound AB, the rates of complex association and dissociation of BB with σF, the opening and closing rates of the AB lids covering the pockets of the nucleotide binding sites and the rates of nucleotide association and dissociation with AB were strongly restricted (Fig. 4). The other kinetic ranges that were affected by the screens were less constrained and 7 parameters were not affected at all by the screen (Fig. 4).

The differences in the extent, to which parameter ranges become restricted, reflect the type of data that was available. Thus the phosphorylation data in Fig. 2A c strongly restrict the phosphorylation rate. On the other hand, the four binding rates of the RNA polymerase are not restricted at all by the in vitro data as the RNA polymerase was not included in the experiments. Similarly, the SpoIIE-dependent rates are only partially restricted as there is only a single experiment (Fig. 2A e,f) that involves SpoIIE. The SpoIIAA-P dimerization rates and the SpoIIE-AAp off-rate are not at all constrained. Finally, not all of the AB-AA rates have become constrained, but at the same time the screen did not reproduce the small difference between the black and the blue datasets in Figure 2A, c, which is obtained when AB is pre-incubated for 5 minutes with ADP before ATP and AA are added (black dataset).

Prediction of physiological behavior

The iterative screening of the parameters was based only on the best fits of the in vitro experiments (Fig. 2A). We wondered in how far the optimization of the parameters with the in vitro data would improve their predictions of the physiological behaviour. The physiological response is mediated by σF-bound RNA polymerase holoenzyme. To support sporulation, σF-bound RNA polymerase holoenzyme must be negligible before septation and must subsequently reach micromolar concentrations within 15 minutes or less, as illustrated in the first panel of Figure 2B. Accordingly, we analyzed the time evolution of the σF-RNA-polymerase holoenzyme for the different parameter sets.

As can be seen in Figure 2B, the fraction of successful physiological responses increases as the parameter values are optimized for in vitro conditions. Many simulations, however, still fail. By comparing the distribution of each individual parameter in the successful the unsuccessful in vivo simulations of the final screen (Fig. 5, Table S2), we identified the critical parameters as the ones associated with the IIE-dependent dephosporylation of AA-P, i.e. the dephosphorylation rate and the association/dissociation rates of the phosphatase (IIE) with its substrate AA-P, as well as the rates of AA-P dimerization. A comparison of the final ranges to the initial ranges of these parameter values (Fig. 5, Table S2) shows that these rates had not been restricted by the in vitro experiments in Fig. 2A. We therefore needed additional data to further restrict these. The previous study identified kinetic rates for these parameter values, based on further NMR experiments21. Once fixed to these values, all simulations predicted the physiological data correctly (Fig. 5), even though still most parameters remained poorly restricted (Fig. 4).

Figure 5
figure 5

Predictive value of the optimized parameter sets for the physiological behavior.

The panels are the same as in Fig. 2(B), but simulated after fixing further parameter values based on data in21, i.e. the AA dephosphorylation rate (k_dephos = 1.26 s−1), the AA-p IIE on & off rates (k_IIEon = 9 × 10−2μM−1s−1, k_IIEoff = 5.8 × 10−1μM−1s−1) and the AAp dimer on & off rates (k_dimon = 2 × 10−1μM−1s−1, k_dimoff = 1 s−1).

Discussion

Mathematical models are increasingly used to define mechanisms in biology, but typically insufficient data is available to determine all parameter values with high confidence. Most mathematical models have previously only been analysed for one optimal parameter set. Recent advances in computing power now allow the screening of larger parameter spaces. We have exploited this to reanalyse a previously published model, whose predictions had been confirmed in experiments. The model describes the regulation of σF during sporulation of Bacillus subtilis21. By sampling from the parameter ranges we were able to fit the detailed experimental data, while imposing restrictions on only some of the parameter ranges. Subsequently we tested in how far the parameter sets, that allowed us to reproduce the measured in vitro data, would also reproduce also the in vivo behavior. We found that although the in vivo behavior improved, still many of the parameter sets failed to reproduce the in vivo behavior. By comparing successful and unsuccessful simulations we identified the critical parameter values and fixed these with further data. This let to the restriction of further parameter ranges and allowed us to reproduce the physiological behavior. The latter shows that the parameterized model can also be used for experimental design, as it defines the critical rate constants that need to be measured to understand a biological behaviour of interest.

Most parameter ranges remain unconstrained, a phenomenon known as parameter sloppiness. Many models in system biology exhibit parameter sloppiness30. In fact, there may be an evolutionary role for parameter sloppiness as systems with sloppy parameters are more robust to changes. Systems with sloppy parameters could thus be optimized for various distinct functions without hampering other functions31. Parameter sloppiness also does not necessarily reduce the reliability of the model predictions, at least as long as the unconstrained parameters do not impact on the model predictions. Thus if both the experimental data and the biological question of interest do not depend on a particular parameter value then it is not a problem if this value is not constrained. However, typically it is very difficult to evaluate such aspect. Local sensitivity analyses are only moderately helpful, as the predictions may extend beyond the local reach of such approximation. The advantage of using the evolutionary sampling approach outlined in this study is that it provides a broader view on the dynamic behaviour of the model that cannot be obtained by a local sensitivity analysis.

We note, however, that in spite of being restricted to a single parameter set, the previous study arrived at predictions that were subsequently confirmed in experiments. This demonstrates that a carefully parameterized model can still lead to reliable predictions, even when studied only locally. In fact, there are ample examples of useful predictions and novel insight that could be gained, although many of the parameter values could not be firmly defined32. In conclusion, despite the limits on the availability of quantitative data, mathematical models can still be very useful in providing novel insight and making interesting predictions.

Methods

Rule description

The following mechanistic rules summarize the well characterized biochemical interactions of the σF system:

  1. 1

    Nucleotides binding to AB (ADP to ATP)

    The protein AB has two binding pockets for nucleotides (ATP, ADP). The nucleotide-binding sites are each covered by a flexible lid that can either be in an open or closed conformation. AB is found in two conformations with either low or high affinity for AA. High affinity AB with two open lids can bind two nucleotides (ATP, ADP) in its pockets, irrespective of whether AB is already AA or σF-bound. A single ADP can be exchanged for ATP whenever there is one site that is not AA-bound, irrespective of the AB conformation.

  2. 2

    Lid opening-closing

    Protein-unbound AB that contains nucleotides in its pockets can open and close its two lids (simultaneously). Whenever AB is bound by either σF, or AA, or both, its lids close faster and the reverse reaction is then very slow.

  3. 3

    AB-σF interaction (σF inactivation)

    σF can bind to AB as long as it has no more than one AA already bound. This interaction does not depend on its lid state or its conformation, but the affinity depends on the nucleotides bound to AB, i.e. the affinity is higher whenAB has at least one ATP in its pockets. If AB is nucleotide-free, then the affinity is very low.

  4. 4

    AB is an allosteric enzyme

    AB can assume either of two conformational states. The conformation of unbound and σF-bound AB is biased towards a conformation that binds AA with low affinity. AA biases the conformational equilibrium of AB to the high affinity conformation.

  5. 5

    AB-AA interaction

    AB can bind to AA as long as σF and another AA are not already both bound to it. The pockets of AB that are ATP-bound have a higher affinity for AA than those that are ADP-bound. Nucleotide-free AB can still bind AA, albeit with much lower affinity.

  6. 6

    Exchange between ADP and ATP

    The total amount of nucleotides is conserved throughout each experiment. In the in vitro case the total concentration of nucleotides is fixed; therefore this reaction is not modeled explicitly. For the in vivo case, this is considered to be a relatively fast equilibrium, keeping the ratio of ATP to ADP fixed.

  7. 7

    AA phosphorylation

    Whenever AA is in a complex with AB and it is bound at the ATP-bound pocket, it can dissociate upon phosphorylation, leaving AB with an ADP at that pocket.

  8. 8

    AA de-phosphorylation

    IIE phosphatase binds to phosphorylated AA (AA-p) and catalyses the release of the phosphate group.

  9. 9

    AA-p dimerization

    Free AA-p can form a homo-dimer complex.

  10. 10

    Competition of σF and σA for the RNA-polymerase

    Both σF and σA form a complex with the core RNA polymerase. Binding of σA is inhibiting the formation of the σF-holoenzyme.

  11. 11

    Production (in vivo)

    AB, AA, σF and IIE can all be produced, while the concentrations of the core RNA polymerase and of σA are considered to be constant on the timescales of these experiments.

  12. 12

    Degradation (in vivo)

    AB is degraded only when unbound. The other proteins have an effective production rate that compensates for their degradation.

Differences between in vitro and in vivo conditions

In the in vitro experiments, core RNA polymerase and σA are not present and these concentrations are therefore set to zero in the simulations. From the previous publication we have data for six distinct experimental set-ups that make use of fluorescence quenching measurements and measurement of protein phosphorylation (Fig. 2A). The experiments have been described in detail previously21. In brief, Figure 2A(a) reports the fraction of σF (total 1.3 μM) that is bound in a complex with AB dimer (total 1 μM) over time in response to the addition of 100 μM ATP (black curve) or ADP (blue curve) after 2 minutes of co-culture in the absence of nucleotides. Figure 2A(b) reports the kinetics of σF-AB unbinding and rebinding upon addition of AA (0.4 μM (red), 0.7 μM (yellow), 1 μM (blue), 1.5 μM (cyan), 2 μM (green) and 3 μM (black)). Figure 2A(c) reports the phosphorylation of AA per AB dimer when AB is directly incubated with 40 μM AA and 100 μM ATP (blue) or pre-incubated for 5 min with either 5 μM ADP (black) or 5 μM ADP and 40 μM AA (red). Figure 2A(d) reports the fraction of σF (total 1.3 μM) that is bound in a complex with AB dimer (total 1 μM) over time in the presence of different concentrations of AA (no AA (black), 2 μM (blue), 4 μM (green) or 6 μM (red) AA). Figure 2A(e) reports the fraction of σF (total 1.3 μM) that is bound in a complex with AB dimer (total 1 μM) over time. σF and AB bind upon addition of ATP (2 min) and subsequently dissociate upon addition of 2.5 μM AA. The extent of σF-AB complex re-formation depends on the concentration of IIE (none (black), 10 nM (green), 40 nM (red) or 100 nM). The last panel, Figure 2A(f), reports the final fraction of σF-bound AB at different concentrations of IIE at two different concentrations of AA (2.5 μM (blue) or 4 μM (red)). 31 parameters were sampled (Table 1), two of them describing the initial concentrations (AB0 and sF0). In the bacterium, the genes that encode AB, AA and σF are localised on an operon and the ratio between these concentrations is therefore strongly constrained23. Therefore, the concentration of AA is scaled with respect to the value of the sampled AB. The gene for IIE is not on this operon and the IIE concentration is therefore fixed independently; its concentration can thus differ relative to the others within the sampled parameter space.

In vivo, proteins can also be produced and degraded and production and decay processes therefore need to be considered to model the in vivo situation. Experiments showed that the rate of de-phosporylation by IIE is 144-fold lower for the physiological salt concentrations than for the salt conditions typically used in in vitro experiments21 and the dephosporylation rate was adjusted accordingly. Additionally, because of the smaller size of the prespore compartment relative to the mother cell and the accumulation of IIE on both sites of the septum (Fig. 1), the effective concentration of IIE (and of the IIE-bound AA) increases 4-fold upon septation. To compare the two experimental conditions, we used the same parameter sets for the in vitro and the in vivo simulations. The parameters that describe production/degradation that are absent in the in vitro simualtions, are scaled based on the respective sampled values of AB0. Similarly, the concentration ranges of σA and core RNA-polymerase were based on the parameter ranges of AB0. The degradation rate of AB was adjusted such that the ratio between degradation and production remains constant.

Parameterization and treatment of uncertainty

Previously estimated parameters of the system21 were used as reference values that defined the center of rather wide sampling ranges (Table 1). The parameters describing reaction rates were sampled over four orders of magnitude, while the parameters that describe concentration were sampled over two orders of magnitude. The parameters corresponding to the initial protein concentration levels were sampled over a narrower range, because their measurement is direct and therefore less prone to estimation errors. All parameters were sampled uniformly on a log10 scale; this way sampling is equally likely for any order of magnitude.

Optimizing the parameter sets

Starting with the initial parameter ranges, we sampled 106 (NS) parameter sets, integrated the system of ODEs and compared the output to the experimental data (Fig. 2A). We determined the sum of squared residuals (SSR) between the simulated observable f(xi) and the experimental values yi. So for each experiment

where nk represents the number of multiple curves within the same experiment, nk,i corresponds to the set of time points for the domain of each curve nk of the same experiment k and can be different for each experiment. We need to define a measure for the overall residual that corresponds to each individual sampled parameter set S, given those differences between and within the different experiments. Therefore, following a naive but objective approach, we normalize each SSRk with respect to its corresponding size of data points. The measure of the total normalized residual then treats all experiments equally and is obtained as:

where s is the index of the sampled parameter set. The contribution of noise to the data variation was not considered, because replicates of the same experiments show that the variance of the data does not scale with measurement size and it is similar for the different experiments.

Subsequently, we ranked the parameter sets according to the increasing order of this measure (e.g. the first parameter set corresponds to the best fit to the data) and investigated the effect of this ranking on each parameter. Therefore we compared different subset sizes of every ranked parameter with its corresponding total pool (106) to observe deviations from the initially assumed uniform distribution. The Kolmogorov-Smirnov test statistic was used as a measure to quantify the distance between these two distributions, i.e. the subset versus the whole pool (106) of each parameter. The p-value of the Kolmogorov-Smirnov test had a minimum for comparisons with subsets of size 103. Thus this subset size was used as representative to indicate affected parameters as imposed by the data. This statistic is useful to highlight the distance between distributions, but it is not informative regarding the qualitative differences of these distributions. For this reason we define here a heuristic cut-off criterion to distinguish whose parameters subset distributions impose different ranges. This heuristic compares the initial range of each parameter with the range that is defined by the middle 90% of its 103 ranked subset. So, for parameter κ, if

then we further restrict the initial parameter range. To enhance the estimation quality of these quantiles, also in the case of very flat tails, additional bootstrapping was performed for the lower bound (5%) of the 5% quantile and of higher bound (95%) of the 95% quantile and the criterion was then tested for the and the . The above quantiles define a sub-domain of the initial parameter ranges and the ranges of the parameters that meet this criterion are then restricted to their corresponding sub-domain. This process is repeated as long as the criterion is met for at least one parameter Fig. 3.

Definition of physiological response

The in vivo experiment starts at time zero with production of all species apart from σA and RNA-polymerase that are considered to be constant for the timescale of the experiment. At t = 2 h septation happens. Upon septation, the production of the proteins stops and the effective concentration of IIE (and IIE-bound AA) is increased by four-fold, because of the difference in cell size. The increase in IIE and IIE-AA levels is immediate, but the response of the σF holoenzyme formation takes about 15 min. For the response to be physiological, the level of holoenzyme before septation must not exceed 0.4 μM, otherwise we would have septation-independent holoenzyme formation. The holoenzyme should be above 1 μM 15 min after the septation and then still remain high for some time. Therefore a successfull physiological response must fulfill the following:

  1. 1

    t = 2 h: [σF-RNA-polymerase] < 0.4

  2. 2

    t = 2 h15 min: [σF-RNA-polymerase] > 1

  3. 3

    t = 4 h: 1 < [σF-RNA-polymerase]