A mechanistic stochastic framework for regulating bacterial cell division

How exponentially growing cells maintain size homeostasis is an important fundamental problem. Recent single-cell studies in prokaryotes have uncovered the adder principle, where cells add a fixed size (volume) from birth to division, irrespective of their size at birth. To mechanistically explain the adder principle, we consider a timekeeper protein that begins to get stochastically expressed after cell birth at a rate proportional to the volume. Cell-division time is formulated as the first-passage time for protein copy numbers to hit a fixed threshold. Consistent with data, the model predicts that the noise in division timing increases with size at birth. Intriguingly, our results show that the distribution of the volume added between successive cell-division events is independent of the newborn cell size. This was dramatically seen in experimental studies, where histograms of the added volume corresponding to different newborn sizes collapsed on top of each other. The model provides further insights consistent with experimental observations: the distribution of the added volume when scaled by its mean becomes invariant of the growth rate. In summary, our simple yet elegant model explains key experimental findings and suggests a mechanism for regulating both the mean and fluctuations in cell-division timing for controlling size.

S1 Theoretical constraints on the molecular mechanism underlying the adder principle The adder principle states that a cell adds a constant volume between birth to division notwithstanding its size at birth [1][2][3]. In order to investigate what molecular mechanisms could realize this behavior, we consider a hybrid system framework shown in Fig. S1. The model here consists of a state variable, ∆V , which denotes the volume added to a cell's volume at birth. The dynamics of ∆V is given by the following ordinary differential equation:∆ V = α(∆V +V b ), (S1. 1) where V b is the volume of a cell at birth, and α represents the growth rate. Furthermore, we assume that the division occurs at an added volume dependent rate h(∆V ). Upon division, the added volume ∆V resets to zero whereas the cell volume at birth resets to (V b + ∆V ) /2. ΔV = Δ + ℎ Δ Δ → 0 → + Δ /2 Figure S1: Description of the cell division process as a stochastic hybrid system. The added volume ∆V evolves as per a deterministic dynamics until the division event takes place. The hazard rate for division is h(∆V ). Upon division, the added volume ∆V and the cell volume at birth V b reset to 0 and (V b + ∆V )/2, respectively.
Using the infinitismal generator of a stochastic hybrid system, one can write the time evolution of expected added volume ∆V as [4]: Using an mean-field approximation, the above equation can be written as In steady state, a solution of equation (S1.3) which has ∆V independent of V b is only possible if the hazard rate has following form: h(∆V ) = 0 for ∆V < ∆V , (S1. 4) h(∆V ) = ∞ for ∆V > ∆V , (S1. 5) where ∆V denotes the volume that a cell attempts to add. In other words, any mechanism which actively senses the added volume has to trigger the division the moment it reaches the prescribed threshold. In the main text, we propose a timekeeper protein to trigger the division. Assuming a deterministic production of the timekeeper protein, results here imply that the division rate follows where x represents the protein level, and X is the prescribed protein copy number threshold for division. This constraint on the division rate was realized by computing the division time as the first-passage time in the main text. Also note that as the main paper describes, the timekeeper protein based mechanism led to realization of the adder model in distribution sense. Therefore the theoretical constraint in equations (S1.6)-(S1.7) is both necessary and sufficient to realize the adder principle of cell size control.

S2 Distribution of FPT given cell volume at birth
As described in the main text (equation (4)), the distribution of the minimum number of burst (transcription) events N required for x(t) to reach the threshold X is computed by using Given a specific form for the distribution of B i , the corresponding distribution for N can be obtained using equation (S2.1) (two specific examples are discussed later in this section).
Having determined the number of bursts needed for cell division, we next focus on the timing of burst events which is determined by the burst arrival rate. Here since the burst arrival rate is time varying (due to dependence on cell volume), the arrival process an inhomogeneous Poisson process. Prior work on inhomogeneous Poisson processes has shown that the distribution for the timing of the n th event is given by [5,6] Note that FPT is the same as the time at which the N th burst event occurs. Thus, the probability density function of FPT is obtained as and is dependent on the newborn cell size V 0 through the function R(t) defined in (S2.2). Next, using the relation in (S2.1), we quantify the distribution N from the distribution of protein burst size B i .

Probability mass function of N for common burst size distributions
We present the form of distribution of N for two relevant cases here: when the protein burst size is one with probability one, and when the protein burst size is geometric [7][8][9][10][11][12]. For the case when the burst size B i is one with probability one, exactly X events are required for the protein level x(t) to reach X for the first time.
That is, we have where δ (n − X) is the Kronecker's delta which is one when n = X and zero otherwise. When the burst size B i follows a geometric distribution [7][8][9][10][11][12], the calculation of the minimum number of transcription events N for this distribution has been previously done in our works [13,14]. The probability mass function of N is given by (S2.5) Here b represents the mean protein burst size. Further, the first three statistical moments of N given by the above probability mass function are These formulas are used in the next section to compute the moments of the volume added between birth to division.

S3 Distribution of volume added ∆V
Let V b denote the volume of a newborn cell. Assuming birth of a cell at t = 0, the volume after at a time t is given by V (t) = V b exp(αt). We assume that the cell divides at the first-passage time whose distribution is given by equation (S2.3). Representing the volume added to the cell's volume at birth until division by ∆V , we have The cumulative distribution function of ∆V can be computed as Differentiating the above expression results in the probability density function of ∆V as follows Hence we can write the probability density of ∆V as Notice that this distribution is an Erlang distribution conditioned to the random variable N.

Moments of ∆V
Mean ∆V Since the distribution of ∆V is conditional Erlang, we have the following expression for mean ∆V .
Second order moment The second order moment of ∆V is given by Third order moment The third order moment of ∆V is given by When the burst size is one with probability one, we have N = X with probability one. The formulas of mean, CV 2 and skewness of ∆V simplify to When the burst size is geometric, we employ the expressions of moments of N from section S1 to get the following expressions: We also note that that the skewness of ∆V is positive in both cases considered above which is consistent with previous results [1].

Scale invariance of the distribution of ∆V
It has been shown in [3] that the distributions of the added volume ∆V in different growth conditions collapse when rescaled by respective ∆V . Mathematically, we want to show that the probability density function f ∆V (v) has the following form: where g(.) is an arbitrary normalized function [3, Supplementary Information equation 36]. For the distribution in equation (S3.9), we consider where N is the expected value of the minimum number of transcription events required for the protein to cross the threshold X. Also, it is related with ∆V as described in equation (S3.10).
For the function g given in equation (S3.18), we have This establishes the scale invariance of the distribution f ∆V (v).
One consequence of the scale invariance property of f ∆V (v) is that the normalized moments ∆V j / ∆V j are independent of the growth conditions [3, Supplementary Information]. This can be checked as follows.
The j th order conditional moment of ∆V (conditioned with respect to N) is given by j th order moment of an Erlang distribution. Thus Therefore using equation (S3.10), we have which is independent of the growth rate. This fact can be used to show that statistical measures such as noise (CV 2 ) and skewness are independent of the growth rate. Take CV 2 for instance. It is defined as By the scale invariance property, ∆V 2 / ∆V 2 is independent of the growth rate α. Thus, the noise CV 2 is also independent of α. Similarly, skewness is given by which is again independent of α by the scale invariance property.

S3-a Model analysis when the timekeeper protein is accumulated between two initiation events
In the previous model formulation, we consider that the timekeeper protein is accumulated between cell birth to division. Here we analyze the model when the accumulation is between two other events in cell cycle (namely initiation of DNA replication), and the corresponding division for an initiation event takes place after a constant time delay T . Assume a cell with volume V init right after an initiation event. At this point, the timekeeper molecules are degraded (or deactivated), and new set of timekeeper proteins are synthesized (or activated) for the next initiation event. These proteins are synthesized at a rate k m V init exp(αt), and the threshold required to be achieved for the next initiation events is θ X where θ is the number of origins of replication right after the previous initiation event. For time being, we will ignore any division events until the next initiation event.
Following the calculations used to derive equation (S3.9), one can write the distribution of the volume added until the next initiation event as where the distribution of the number of transcription events f N (n) is now given by One can write the average volume per origin of replication added between the initiation events are Note that for a deterministic burst, the average volume added per origin of replication is same as the volume added between birth and division for the previous model. For geometric burst case, the added volumes are approximately same. Recall that we had ignored division events between two initiation events for above analysis. If there is a division event between two initiation events, both the origins of replication and the timekeeper proteins accumulated since previous initiation event can be assumed to divide equally into the daughter cells. Consequently, for each daughter cell, the threshold for next initiation event will become approximately half of what it was for the mother cell. Thus, one can ignore the division event and consider the total volume as one cell until the next initiation event takes place. One can, of course, expect stochastic effects in the partitioning of protein molecules but inclusion of them should not change the average behavior discussed here.

S4 Distribution of newborn cell size V b
In this section, we determine the distribution of newborn cell size V b , and discuss its scale invariance property. For this purpose, we consider a newborn cell whose volume is V 0 and observe its volume after subsequent cell cycles. Let V p denote the cell volume after p ≥ 1 cell cycles, then we have where ∆V i denotes the volume added during the i th cell cycle. The random variables ∆V i , i ∈ {1, 2, . . . , p} are independent and identically distributed, and their distribution is same as that of ∆V as given by equation (S3.9). At each division, the cell is assumed to divide symmetrically. The steady-state distribution of V p , i.e., when p → ∞ would give the distribution of V b . Note that the contribution from the initial cell volume V 0 in equation (S4.1) decays exponentially. Therefore, we can make a first approximation as In essence, V p is a weighted sum of independent and identically distributed random variables ∆V i . As the following holds for the weights in above sum in equation (S4.2) one can use standard probability theory arguments to show that the distribution of V p converges. In fact, for the weights we have in equation (S4.2), the mean and variance of V p respectively converge to ∆V , and Furthermore, we can also find the expression of the probability density function of V p by making use of probability density function of random variables ∆V i in equation (S3.9). It is easy to show that the random variables ∆V i := 2 i−p−1 ∆V i have the following family of distribution Note that the random variables ∆V i are independent but not identical anymore. However, they still follow conditional Erlang distributions same as ∆V i and therefore inherit the scale invariance property. The distribution of sum of above Erlang random variables can be written in the following compact form [15,16] where C n,p = 2 p−i+1 k m α n , (S4. 6) and Q i (v), for a given N = n, is a polynomial in v of degree n − 1 with following form: with the coefficients a i, j,n,p computed as One can note that f V p (v) has polynomial terms involving the coefficients 2 p−i+1 which are multiplied by exponential terms involving −2 p−i+1 . Therefore the probability density will converge as p becomes large though it is difficult to find the final expression to which f V p (v) converges.

Scale invariance of cell size at birth and division
The scale invariance property of f V b (v) can be established using equation (S4.2) using the Laplace transform of the probability density of random variables ∆V i [3, Supplementary Information]. Denoting the Laplace transform operator by L , we can write Thus the scale invariance of the distribution of V b also follows. The cell-size at division is given by V b + ∆V . Since both V b and ∆V are scale invariat, this results in scale invariance of distribution of cell-size at division.

S5 Distribution of division times
Recall that the distribution in equation (S2.3) assumes a given newborn cell volume V b . We can uncondition it with respect to distribution of V b to obtain the distribution of division times (denoted by f τ d (t)) is the probability distribution of cell volumes at birth. As f V b (v) has an expression given by equation (S4.5) as p → ∞, we can use it to obtain expression of f τ d (t). Using this relation, the division time distribution is given by From equations (S4.6) and (S4.8), one can note that the parameter α appears in C l,p as α −l , and in a i, j,l,p as α l− j . Therefore, we can write the above expression in where C l,p = α l C l,p , and a i, j,l,p = α l− j a i, j,l,p .
To show scale invariance of the distribution upon scaling by its mean, we compute the mean division as follows.
where 2 F 1 represents the hypergeometric function. The second step above was solved using Mathematica software after changing the dummy variable of the integral via substitution z = e αt + 2 p−i+1 − 1. Note that the mean division time can be written as for a constant A which basically represents the complex expression in equation (S5.6).
The scale invariant of f τ d (t) can be shown by constructing a function g(w) as In this case, we have Remark: While we showed the scale invariance property of the distribution of τ d from above analysis, the expressions here are quite convoluted. In order to approximate the moments of division time, we go back to equation (S5.1), and use an approximate expression for f V b (v). One way of approximating V b would be to use a large finite value of p. However, that doesn't simplify the expression of f τ d (t). We therefore use Even though this is a bad approximation (recall the discussion in section S4 that only their means are same), we use it because we know that probability density of V b would also have a Erlang like expression. Using this approximation yields Protein threshold X K1 Figure S2: Values of the constant K 1 for different values of protein threshold X, and mean burst sizes b.
Moments of division time τ d can be written as where constants K j are shorthand for the summation terms. As expected from a deterministic analysis, and approximate computations performed in [3], the constant K 1 should be approximately equal to log 2. We found that K 1 ≈ 0.7 ≈ log 2 for the two distributions of burst size B i (deterministic, and geometric) used in this manuscript, and several values of threshold X. Some of these values are shown in Figure S2.

S6 Moments of cell-division time given newborn cell size
In this section, we provide details on how FPT (cell division time) moments depend on cell size at birth V b . The result pertaining dependence of mean FPT on V b is provided here. How change in V b affects the noise in FPT has been described in the main text ( Figure 2).

Mean cell-division time as a function of newborn cell size
The expression for FPT probability density in equation (S2.3) can be used to numerically compute the mean FPT as V b is varied. The model predicts that the mean division time decreases as the newborn cell volume is increased. This behavior is consistent with previous understanding of negative correlation between cell division time and newborn cell size [2,3].

Inference of model parameters
Here we discuss how model parameters, namely, transcription rate (k m ), the threshold (X), and the protein burst size (b) used to draw Figure 2a in the main text were inferred. We used the moments of the added volume ∆V to estimated the model parameters from experimental data. The reason behind this is that exact expressions of the moments of added volume are available (equations (S3.10)-(S3.12)). To compute the estimates, we use the nonlinear constrained optimization solver (fmincon) available in MatLab with 0.01 as initial parameter value for k m , X, and b. The estimated values for the transcription rate k m , protein threshold X, and mean burst size b were 0.13, 65.0, and 5.0, respectively. Data processing to obtain Fig. 2b Since the initial size of the cell is a real random variable, we used a binning strategy to get cell cycle time statistics (noise) given different newborn sizes. We organized the newborn cell sizes into four bins: