Fluctuation relations and fitness landscapes of growing cell populations

We construct a pathwise formulation of a growing population of cells, based on two different samplings of lineages within the population, namely the forward and backward samplings. We show that a general symmetry relation, called fluctuation relation relates these two samplings, independently of the model used to generate divisions and growth in the cell population. These relations lead to estimators of the population growth rate, which can be very efficient as we demonstrate by an analysis of a set of mother machine data. These fluctuation relations lead to general and important inequalities between the mean number of divisions and the doubling time of the population. We also study the fitness landscape, a concept based on the two samplings mentioned above, which quantifies the correlations between a phenotypic trait of interest and the number of divisions. We obtain explicit results when the trait is the age or the size, for age and size-controlled models.

While the growth of cell populations appears deterministic, many processes occurring at the single cell level are stochastic. Among many possibilities, stochasticity at the single cell level can arise from stochasticity in the generation times 1 , from stochasticity in the partition at division 2,3 , or from the stochasticity of single cell growth rates, which are usually linked to stochastic gene expression 4 . Ideally one would like to be able to disentangle the various sources of stochasticity present in experimental data 5 . This would allow to understand and predict how the various sources of stochasticity affect macroscopic parameters of the cell population, such as the Malthusian population growth rate 6,7 . Beyond this specific question, research in this field attempts to elucidate the fundamental physical constraints which control growth and divisions in cell populations.
With the advances in single cell experiments, where the growth and divisions of thousand of individual cells can be tracked, robust statistics can be acquired. New theoretical methods are needed to exploit this kind of data and to relate experiments carried out at the population level with experiments carried out at the single cell level. For instance, one would like to relate single-cell time-lapse video microscopy experiments of growing cell populations 8 , which provide information on all the lineages in the branched tree, with experiments carried out with the mother machine configuration, which provide information on single lineages 9,10 .
Let us now review quickly how the issue was addressed theoretically. In 2015, a pathwise thermodynamic framework was built for population dynamics using large deviation theory. One important result was a variational principle for the population growth rate 11 , which was formulated in terms of two key path distributions, namely the chronological and the retrospective probability distributions. Then, in order to explain their experimental observation that populations of Escherichia coli double faster than the mean doubling time of their constituent single cells, Hashimoto et al. extended the classical work of Powell 6 for age models without mother-daugher correlations 12 . Nozoe et al. 13 then showed that the difference between the forward (chronological) and backward (retrospective) distributions can be used to define a quantity called phenotypic fitness landscape, which informs whether a specific phenotypic trait affects the population growth rate. In that work, they also derived the key relation between the two distributions, already known in the mathematical literature of branching processes 14 , but they did not connect this result with the field of fluctuation relations. Various theoretical works followed which addressed other aspects of the role of the stochasticity at the single cell level 3,15,16 . As far as we can tell, the connection between the results of Nozoe and the field of fluctuation relations was only made explicit in our first work on this topic 17 . In that work, we also derived inequalities for mean generation times, already obtained in 12 for age models, but importantly we proved using the fluctuation relation that they are valid beyond age models, in particular for a broad class of size models. www.nature.com/scientificreports/ In the present work, we further investigate the connection between the statistics at the single lineage and population levels using fluctuation relations. These fluctuation relations only depend on the structure of the branched tree but not on the class of dynamical variables (protein concentration, cell size or cell age.) defined on it. These relations imply that the above inequalities for the mean number of divisions or mean generation times hold regardless of the specific dynamics.
We then provide an interpretation of the fluctuation relations within Stochastic Thermodynamics 18 and we explore some consequences. One application concerns the inference of the population growth rate using single lineage data 19 . We then introduce some specific dynamical models, which we simulate to confirm the fluctuation relations and their consequences for mean generation times. Then, for these specific models and for key phenotypic variables such as the size and the age, we also study the fitness landscape 13 . theoretical framework the backward and forward processes. Let us consider a branched tree, starting with N 0 cells at time t = 0 and ending with N(t) cells at time t as shown on Fig. 1. We assume that all lineages survive up to time t, and therefore the final number N(t) of cells corresponds to the number of lineages in the tree.
The most natural way to sample the lineages is to put uniform weights on all of them. This sampling is called backward, (or retrospective) because at the end of the experiment one randomly chooses one lineage among the N(t) with a uniform probability and then one traces the history of the lineage backward in time from time t to 0, until reaching the ancestor population. The backward weight associated with a lineage l is defined as In a tree, some lineages divide more often than others, which results in an over-representation of lineages that have divided more often than the average. Therefore by choosing a lineage with uniform distribution, we are more likely to choose a lineage with more divisions than the average number of divisions in the tree.
The other way of sampling a tree is the forward (or chronological) one and consists in putting the weight on a lineage l with K(l) divisions, where m is the number of offspring at division. This choice of weights is called forward because one starts at time 0 by uniformly choosing one cell among the N 0 initial cells, and one goes forward in time up to time t, by choosing one of the m offspring with equal weight 1/m at each division. The backward and forward weights are properly normalized probabilities, defined on the N(t) lineages in the tree at time t: Single lineage experiments are precisely described by a forward process since experimentally, at each division, only one of the two daughter cells is conserved while the other is eliminated (for instance flushed away in a microfluidic channel 9, 10 ). In these experiments, a tree is generated but at each division only one of the two lineages is conserved, with probability 1/2, while the rest of the tree is eliminated. This means that single lineage observables can be measured without single lineage experiments, provided population experiments are analyzed with the correct weights on lineages. www.nature.com/scientificreports/ Link with the population growth rate. Since the backward weight put on a lineage depends on the number of cells at time t, it takes into account the reproductive performance of the colony but it is unaffected by the reproductive performance of the lineage considered. On the contrary, the forward weight put on a specific lineage depends on the number of divisions of that lineage but is unaffected by the reproductive performance of other lineages in the tree. Therefore, the difference between the values of the two weights for a particular lineage informs on the difference between the reproductive performance of the lineage with respect to the colony. We now introduce the population growth rate: which is linked to forward weights by the relation where �·� for is the average over the lineages weighted by ω for , and K i = K(l i ) . Combining the two equations above, we obtain 19 : which allows an experimental estimation of the population growth rate from the knowledge of the forward statistics only. Equation (4) can also be re-written to express the bias between the forward and backward weights of the same lineage which is the reproductive performance of the lineage divided by its average in the colony with respect to ω for .
A similar relation is derived using the relation Combining Eqs. If we now introduce the probability distribution of the number of divisions for the forward sampling p for (K) = l δ(K − K(l))ω for (l) and similarly for the backward sampling, we can also recast the above relation as a fluctuation relation for the distribution of the number of divisions: Let us now introduce the Kullback-Leibler divergence between two probability distributions p and q, which is the non-negative number: Using Eq. (10), we obtain A similar inequality follows by considering D KL (ω for ||ω back ) . Finally we obtain In the long time limit, lim t→+∞ t/�K� back = �τ � back , where τ is the inter-division time, or generation time, defined as the time between two consecutive divisions on a lineage. The same argument goes for the forward (10) ω back (l) = ω for (l) e K(l) ln m−t� t .
Scientific RepoRtS | (2020) 10:11889 | https://doi.org/10.1038/s41598-020-68444-x www.nature.com/scientificreports/ average. In the case of cell division where each cell only gives birth to two daughter cells ( m = 2 ), the center term in the inequality tends to the population doubling time T d . Therefore, this inequality reads in the long time limit: Let us now mention a minor but subtle point related to this long time limit. For a lineage with K divisions up to time t, we can write t = a + K i=1 τ i , where a is the age of the cell at time t and where τ i is the generation time associated with the ith division. Then t/K = τ m + a/K , where τ m is the mean generation time along the lineage. For finite times, all we can deduce is t/K ≥ τ m . Therefore the left inequality of Eq. (15) always holds while the right inequality does not necessarily hold at finite time.
Inspired by work by Powell 6 , the inequalities of Eq. (15) have been theoretically derived in 12 for age models. In our previous work 17 , we have replotted the experimental data of 12 which confirm theses inequalities and we have shown theoretically that the same inequalities should also hold for size models. In fact, as the present derivation shows, the relation equation (14) is very general and only depends on the branching structure of the tree, while the relation equation (15) requires in addition the existence of a steady state. These inequalities and Eq. (11) express fundamental constraints between division and growth, which should hold for any model.

Stochastic thermodynamic interpretation.
The results derived above have a form similar to that found in Stochastic Thermodynamics 18 . According to this framework, Eq. (5) is an integral fluctuation relation (similar to Jarzynski relation) while Eq. (11) is a detailed fluctuation relation (similar to Crooks fluctuation relation). Furthermore, the inequalities equation (14) represent a constraint equivalent to the second law of thermodynamics, which classically follows from the Jarzynski or Crooks fluctuation relations. It is known that these inequalities take a slightly different form when expressed at finite time or at steady state, which is indeed the case here when comparing Eq. (14) with Eq. (15). A difference between work fluctuation relations like Crooks or Jarzynski and equations (5) and (11), is that Crooks or Jarzynski describe non-autonomous systems which are driven out of equilibrium by the application of a time-dependent protocol, whereas the relations for cell growth derived here concern autonomous systems, in the absence of any external protocol.
One of the main applications of Jarzynski or Crooks fluctuation relations concerns the thermodynamic inference of free energies from non-equilibrium fluctuations. Similarly, Eq. (5) or Eq. (11) can be used as estimators of the population growth rate. The specific advantage of Eq. (5) with respect to Eq. (11) is that it only requires single lineage statistics, which can be obtained from mother machine experiments. Let us now show how this can be done in practice. We use the data from 20 , where the growth of many independent lineages of E. coli have been recorded over 70 generations in a mother machine at three different temperatures (25 °C, 27 °C, and 37 °C), precisely 65 lineages for 25 °C, 54 for 27 °C, and 160 for 37 °C. For each temperature condition, we study the convergence of the estimator of the population growth rate based on Eq. (5), which we call lin as a function of the length t of the lineages for a fixed number of independent lineages L, and as a function of the number of independent lineages for a fixed observation time.
Firstly, for each temperature, we take into account all the lineages available and truncate them at an arbitrary time t smaller than the length of the shortest lineage of the set. On these portions of lineages of length t, we compute lin versus the time t as shown in Fig. 2a. We see that the estimator lin starts from zero, increases and eventually converges rather quickly towards a limiting value. The limit we found agree with the independent analysis carried out in 19 , with only one caveat, these authors reported that their estimator started at high values and then decreased towards the limit, while in our case, the estimator starts at zero and later increases towards the limit. In our case, the estimator needs to be zero at short times, before the first divisions occur.
Secondly, we truncate all the lineages at a fixed time equal to the length of the shortest lineage of the set, and compute lin versus the number L of lineages considered for the estimation, which have been randomly selected from the ensemble of available lineages. As shown in Fig. 2b for the case at 37 • C (curves for the other temperatures look exactly the same), the convergence is also excellent in that case. Although the value of the population growth rate which is obtained in this way can not be measured independently from the evolution of the population in the mother machine setup, this convergence is indicative of the success of the method. The figure also confirms that the value of the population growth rate deduced from the estimator lin is larger than ln(2)/ τ for , as predicted by the right inequality of Eq. (15).
Here, the estimator is found to provide an excellent estimation, but this is not always so. For instance, for the inference of free energies from non-equilibrium work measurements, the exponential average of the estimator is often dominated by rare values, which are not accessible or not well sampled 21 . To understand why this problem does not arise here, we show in inset of Fig. 2b, the distribution P(K) of the number of divisions together with the same distribution weighted by the factor 2 K and normalized. The peak of that modified distribution informs on the dominant values in the estimator 21 . Here, we observe that both distributions have a narrow support and are close to each other. The weighted distribution is peaked at K = 67 while P(K) is peaked at K = 66 , therefore typical and dominating values are very close, which explains why the estimator is good.
Let us now further develop the Stochastic Thermodynamic interpretation of our results by analyzing the implications of the previous fluctuation relations when dynamical variables are introduced on the branched tree of the population. Let us introduce M variables labeled (y 1 , y 2 , . . . , y M ) to describe a dynamical state of the system, then a path is fully determined by the values of these variables at division, and the times of each division. We Scientific RepoRtS | (2020) 10:11889 | https://doi.org/10.1038/s41598-020-68444-x www.nature.com/scientificreports/ call y(t) = (y 1 (t), y 2 (t), . . . , y M (t)) a vector state at time t and {y} = {y(t j )} K j=1 a path with K divisions. For cell growth models, the variables y i can typically be the size and age of the cell, or the concentration of a key protein.
The probability P of path {y} is defined as the sum over all lineages of the weights of the lineages that follow the path {y}: where {y} i is the path followed by lineage l i . Using the normalization of the weights ω on the lineages, we show that P is properly normalized: d{y} K P ({y}, K, t) = 1 . We then define the number n({y}, K, t) of lineages in the tree at time t that follow the path {y} with K divisions: This number of lineages is normalized as d{y} K n({y}, K, t) = N(t) . Then, the path probability can be rewritten as Since n({y}, K, t) is independent of a particular choice of lineage weighting, we obtain which generalizes Eq. (11). In our previous work 17 , we have derived this relation for size models with individual growth rate fluctuations (i.e. y = (x, ν) ) but we were not aware of the weighting method introduced by 13 , and for this reason, we used the term 'tree' to denote the backward sampling, and the term 'lineage' to denote the forward sampling.
This relation has a familiar form in Stochastic Thermodynamics. The central quantity called entropy production can indeed be expressed similarly as the relative entropy between probability distributions associated with a forward and a backward evolution. In this analogy, {y} is analog to the trajectory and t t − K ln m is analog to the entropy production. Then, the equivalent of a reversible trajectory for which the entropy production is null is a lineage for which the number K of divisions is equal to t� t / ln m , that is, a lineage having the same reproductive performance as that of the colony. When all the lineages in a tree have this property, there is no variability of the number of divisions among them. In that case, the forward and backward distributions are identical, and the cost function t t − K ln m vanishes for all lineages.

Mixed age-size controlled models
Dynamics at the population level. The state of a cell is described by its size x, its age a and its individual growth rate ν , with y = (x, a, ν) . Such mixed size-age model includes the 'adder' in which the cell divides after adding a constant volume to its birth volume [22][23][24][25] .  (2)/ τ for , which is smaller than the limit value of lin , as expected from the second law-like inequality, namely Eq. (15). In the inset, the purple histogram is the distribution of the number of divisions, while the green filled histogram is the histogram deduced from it by weighting it by a factor 2 K and normalizing. All the 160 lineages were used to plot these histograms.
Scientific RepoRtS | (2020) 10:11889 | https://doi.org/10.1038/s41598-020-68444-x www.nature.com/scientificreports/ The evolution of the number of cells n(y, K, t) in the state y at time t, that belong to a lineage with K divisions up to time t is governed by the equation and the boundary condition where B(y) is the division rate and �(y|y ′ ) is the conditional probability (also called division kernel) for a newborn cell to be in state y knowing its mother divided while in state y ′ , normalized as �(y|y ′ )dy = 1 , for any y ′ .
Dynamics at the probability level. While n(y, K, t) in Eq. (21) is independent of the choice of weights put on the lineages, we now turn to a description in terms of the probability p(y, K, t) for a cell to be in state (y, K) at time t if chosen randomly among the N(t) cells in the tree at that time. To do so, one has to choose how to weight each cell in the colony, which is equivalent to weight each lineage, since at time t each cell is the ending point of one lineage.
The first possibility is the backward sampling, for which each lineage is weighted uniformly. In this case, we define p back as Dividing Eq. (21) and the boundary condition equation (22) by N(t) we obtain and where we defined the instantaneous population growth rate as The instantaneous population growth rate and the population growth rate defined in Eq. (3) are related by: In the long-time limit, N grows exponentially with constant rate p , and thus t = p = .
The other possibility is to use the forward statistics, in which case we define the probability p for , as Dividing Eq. (21) and the boundary condition equation (22) by m K we obtain and One can notice that the backward statistics is well suited to study the population, while the forward statistics reproduce the behaviour of single lineage experiments. Indeed, by taking Eqs. (24) and (25) for the population/ backward probability p back , and choosing � p (t) = 0 and m = 1 we recover Eqs. (29) and (30). This equation is then a population equation in which we follow only one cell, so that � p (t) = 0 and m = 1 , which we call single lineage experiment.

Illustration of the fluctuation relation. We simulated the time evolution of colonies of cells, obeying
Eqs. (21) and (22), for age and size models in order to illustrate the fluctuation relation. Since results are very similar-as expected-for age models, we restrict ourselves to size models. We tested two results: the fluctuation relation for the number of divisions Eq. (11) and one of its consequences: the inequality for the mean number of divisions Eq. (14). All simulations for size models were conducted with the division rate B(x, ν) = νx α , where α is the strength of the control and x is the dimensionless size. Power law were found to be good approximations for empirical division rates B(x) 2,24,26 . The factor ν , being the only time scale for size models, gives B(x) its proper dimension. Similarly for age models 26 , we choose B(a, ν) = νa α .
(29) (∂ t + ∂ a )p for (y, K, t) + ∂ x νxp for (y, K, t) + B(y)p for (y, K, t) = 0 , Scientific RepoRtS | (2020) 10:11889 | https://doi.org/10.1038/s41598-020-68444-x www.nature.com/scientificreports/ On Fig. 3a, the backward and forward probability distributions of the number of divisions are shown for a size model. The two distributions intersect at the number of divisions K = t� t / ln 2 . The inset of Fig. 3a shows the logarithm of the ratio q(K, t) = p for (K, t)/p back (K, t) of the two distributions, which is as expected a straight line of slope − ln 2 when plotted against the number of divisions. For convenience and for Fig. 3a only, noise in the volume partition at division has been introduced, by choosing for the conditional probability �(x|x ′ ) a uniform distribution between sizes x = 0 and x = x ′ . This has the effect of broadening the distributions P(K) with respect to the case of deterministic symmetrical volume partition.
Then, we tested the inequality on the mean numbers of divisions by varying the strength of the size-control α . Results are shown on Fig. 3b. One one hand, we see that the less control on size, the more discrepancy between the two determinations K back and K for . On the other hand, when increasing the control, the two determinations converge to the population doubling time, where no stochasticity in the number of divisions is left, and every lineage carries the same number of divisions, leading to the equality of the backward and forward statistics.

Phenotypic fitness landscapes
The fitness of a phenotypic trait s is a measure of the reproductive success of individuals carrying it. It is usually defined as the number of offsprings of one individual with a given value of the trait and is quite difficult to evaluate. Nozoe et al. suggested that one way to measure it could be to compare the chronological and retrospective marginal probabilities 13 and accordingly defined it as: so that This has again the form of a fluctuation relation similar to Eq. (11), except for the replacement of the factor K ln 2/t by the function h(s). This suggests that the fitness landscape h(s) plays a role similar to that of an effective division rate, which depends on the trait s. In line with this interpretation, in the particular case where s = K , Eq. (11) leads to h (K) = K ln 2/t , where the fitness landscape for trait K is called the lineage fitness and is written h . In a branched tree, lineages with a large number of divisions K are exponentially over-represented in the population with the backward sampling as compared to the forward sampling. This means that lineages with large K have a larger fitness than the ones with a small K, which is coherent with h (K) being an increasing function of K.
In the following, we rewrite the definition of h(s) in a slightly different way 17 using where we have introduced the probability of the number of division events conditioned on trait s at the forward level, R for (K|s) . Lastly, the fitness landscape reads 17 (33) P back (s) = e −t� t P for (s) www.nature.com/scientificreports/ An increasing or decreasing fitness landscape means a positive or negative correlation of the trait value with the capacity to divide, whereas a constant fitness landscape means that the trait is not correlated with the number of divisions. Indeed, if we consider a trait s which does not affect the number K of divisions, then R for (K|s) = P for (K) and Eq. (34) reads h(s) = ln K 2 K P for (K) /t , which is equal to t according to Eq. (5). In that case, we find that the backward and forward probabilities for that trait s are equal.
In the next sections, we evaluate the relevance of the key variables from our model, namely the size and the age by evaluating their fitness landscapes in size and age models.
Size models. We start with a case where the fitness landscape is fully solvable namely a size model with no individual growth rate fluctuations and with symmetric division. Let us consider a colony starting with one ancestor cell of size x 0 . Then, the available sizes at time t are discrete and given by x = x 0 exp[νt]/2 K where K is the number of divisions undergone by the cell. Therefore a particular size x can be reached only if there is an integer K satisfying this relation, and this integer is unique, leading to Using this relation in Eq. (34), one finds The fitness landscape of the size is a decreasing function, which is coherent with the over-representation of cells that divided a lot, since these cells are more likely to be small due to the numerous divisions. Reporting this result in Eq. (33), we obtain a fluctuation relation for the size which in the long time limit becomes where we used the property that in a steady state, the population growth rate and the individual growth rate are equal when there is no individual growth rate variability.
In some setups, experiments do not start with a unique ancestor cell but with N 0 > 1 initial cells, with possibly heterogeneous sizes. We describe this heterogeneity by the average initial size x 0 and the standard deviation σ x 0 . In this case, accessible sizes are still discrete but depend on both the number of divisions and the initial cell that started the lineage, and are expressed as x i 0 exp[νt]/2 K , where K takes integer values from 0 to ∞ and where x i 0 ∈ X 0 , with X 0 the set of initial sizes. Consequently, a final size x can possibly be reached by different couples (K i , x i 0 ). In order to go further, we now introduce explicitly the initial sizes x i 0 in Eq. (34) as When conditioning on the initial size x i 0 , there is only one possible number of divisions K to reach size x, so that R for (K|x, x i 0 ) obeys an equation similar to Eq. (35). Let us examine two limit cases: (i) small variability in the initial sizes and (ii) large variability in the initial sizes.
Case (i) is characterized by a small number N 0 of initial cells and a small coefficient of variation σ x 0 / x 0 . In this case, it is realistic to say that a final size x can only be reached by one couple (K * , x * ) , because the sets of accessible sizes generated by each initial cell do not overlap. Therefore, R for (x i 0 |x) = δ(x i 0 − x * ) and so for any final size x, only one initial size x * survives in the sum, so that Eq. (39) reads h(x) = ν + ln x * /(x * exp[νt]/2 K ) /t =h(K) . Thus cells that come from lineages with the same number of divisions K have the same fitness landscape value h(x) for the size, regardless of the size x * of the initial cell of their lineages. Thus, available values for h(x) are quantified by K and form plateaus, where points representing cells coming from different ancestors but with the same number of divisions accumulate, as shown in Fig. 4a.
Case (ii) is characterized by a large number N 0 of initial cells and a large coefficient of variation σ x 0 / x 0 . Unlike in case (i), the sets of accessible sizes generated by each initial cell have many overlaps, so that a final size x can be reached by many different couples (K i , x i 0 ) . We make the hypothesis that a final size x can be reached by any initial cell with uniform probability, so that R for (x i 0 |x) = 1/N 0 . Therefore, Eq. (39) becomes  Fig. 4b confirms that the plateaus observed in case (i) are replaced by a smooth curve depending on the mean initial size. We observe the same effect, namely the loss of the plateaus, when introducing fluctuations in individual growth rates.
Age models. Constant individual growth rate. We consider the case where the individual growth rate is constant and equal to ν . In steady-state, the forward age distribution reads (see 17 where p for (a) (resp. p back (a) ) were denoted p(a) (resp. P(a))): To find the integration constant p for (0) , we use the normalization of probability p for : Similarly, the steady-state backward distribution of ages reads In this case, the integration constant p back (0) can be expressed both using the normalization of p back (a) , as done for the forward case, or using p back (0) = 2� , as shown in 17 .
Therefore, the ratio of the age distributions using the backward and forward statistics reads where Z is defined in Eq. (42) and only depends on the division rate B(a). This relation has a similar form as the relation derived by Hashimoto et al. 12 for the distributions of generation times, except for the extra ageindependent factor Z . Finally, the fitness landscape reads The initial condition does not play any role in this derivation, therefore, unlike size models, the results obtained are unchanged for any number N 0 of initial cells with heterogeneous initial ages.
The above calculation is general because we did not put any constraint on B(a). Let us now go into more details by choosing a power law for the division rate: B(a) = νa α . In this case, the integral of Eq. (42) is solvable and gives in terms of Gamma function Ŵ(x) . Results are plotted on Fig. 5a, which shows that theoretical predictions for the backward and forward age distributions are in good agreement with the numerical histograms. The inset plot shows the age fitness landscape, which follows the linear behavior predicted by Eq. (45).
Let us examine the particular case of uncontrolled models, for which the division rate is constant: B = ν . This corresponds to the case α = 0 in the power law analysis conducted above. Replacing α by 0 in Eq. (46) leads to Z = 1/ν ; moreover in steady state � = ν , so that Moreover, the distributions themselves are greatly simplified and read which shows that in this special case the age distributions are themselves identical with the generation time distributions.
Fluctuating individual growth rates. Another interesting extension of this calculation concerns the case of fluctuating individual growth rate ν , for which the division rate then becomes a function of a and ν : B(a, ν) . Then, steady state age distributions are 17 :  . In (a), the forward (resp. backward) age distribution is shown with orange filled histogram (resp. blue empty histogram). The red and green curves are the corresponding theoretical predictions. The inset plot shows the age fitness landscape (purple crosses) and the theoretical linear law (green). The horizontal black dashed-line represents the population growth rate . The simulation was conducted with B(a) = νa α , ν = 1 , α = 2 , t = 12 and N 0 = 1 . In (b), the age fitness landscapes are shown for models without (purple dots) and with (pink squares) mother-daughter correlations in individual growth rates. The green line of slope − fits well the purple dots. For both models, we chose a Gaussian distribution for , of standard deviation σ ν , and centred either on the mother growth rate ν ′ for models with correlations or on a constant mean growth rate ν m for uncorrelated models. The simulation was conducted with ν m = 1 , σ ν = 0.4 , α = 5 , t = 12 and N 0 = 1.
Scientific RepoRtS | (2020) 10:11889 | https://doi.org/10.1038/s41598-020-68444-x www.nature.com/scientificreports/ where p for (0, ν) and p back (0, ν) are given by the boundary conditions: In the absence of mother-daughter correlations for the individual growth rate, then � ν|ν ′ =�(ν) , which implies that p for (0, ν) and p back (0, ν) have the same dependency in ν: Finally, the fluctuation relation for the age reads which is the equivalent of Eq. (44) for fluctuating growth rates without mother-daughter correlations. Therefore, the age fitness landscape features the same linear dependency in age with a slope − as in the case of constant individual growth rate.
In the general case with mother-daughter correlations, this statement is not necessarily true though, because p for (0, ν) and p back (0, ν) do not have in general the same dependency in ν.
Consequently, looking at the slope of the age fitness landscape informs on the presence of mother-daughter correlations as illustrated numerically in Fig. 5b, where the age fitness landscape for models without motherdaughter correlations aligns with the theoretical prediction of slope − ; while the same function for models with mother-daughter correlations presents a non-linear age dependency.

Discussion
We have studied the relation between two different samplings of lineages in a branched tree: one sampling called backward or retrospective presents a statistical bias with respect to the forward or chronological sampling, an observation which is important to relate experiments carried out at the population level with the ones carried out at the single lineage level. This statistical bias can be rationalized by a set of fluctuation relations, which relate the probability distributions in the two ensembles and which are similar to fluctuation relations known in Stochastic Thermodynamics. This analogy leads to efficient methods to infer the population growth rate from an analysis of lineages, as we demonstrated by the analysis of the mother machine data of Tanouchi et al. 20 . Important inequalities for the mean number of divisions or the mean generation times follow from these fluctuation relations, which have been verified experimentally 12 for various strains of E Coli. It would be interesting to generalize these studies to other cell types, and in the particular context of this paper, it would be useful to perform experimental studies in bulk or semi open configurations, to test the predictions which involve a comparison between forward and backward samplings.
By measuring the difference between these two samplings for a specific trait, one obtains the fitness landscape, introduced by Nozoe et al. 13 . While these authors have applied that concept to variables which are not reset or redistributed at division in their work, in the present paper, we used the concept of fitness landscape for variables like the size and the age, which precisely undergo a reset at division in size and age models. We derived expressions for these fitness landscapes, which agree with the statistical bias which we expect when measuring size or age distributions in cell populations. In addition, we also find that the precise form of the age fitness function appears to inform whether or not mother-daughter correlations are present in age models.
In the future, it would be valuable to extend our approach to include other important phenotypic state variables besides size or age, such as variables controlling replication dynamics 3,27 . We hope that our work contributes to clarifying the connection between single lineage and population statistics and to understanding the fundamental constraints which cell growth and division must obey.