Lineage marker synchrony in hematopoietic genealogies refutes the PU.1/GATA1 toggle switch paradigm

Molecular regulation of cell fate decisions underlies health and disease. To identify molecules that are active or regulated during a decision, and not before or after, the decision time point is crucial. However, cell fate markers are usually delayed and the time of decision therefore unknown. Fortunately, dividing cells induce temporal correlations in their progeny, which allow for retrospective inference of the decision time point. We present a computational method to infer decision time points from correlated marker signals in genealogies and apply it to differentiating hematopoietic stem cells. We find that myeloid lineage decisions happen generations before lineage marker onsets. Inferred decision time points are in agreement with data from colony assay experiments. The levels of the myeloid transcription factor PU.1 do not change during, but long after the predicted lineage decision event, indicating that the PU.1/GATA1 toggle switch paradigm cannot explain the initiation of early myeloid lineage choice.

: Inference of lineage choice from trees can accurately reconstruct the underlying differentiation and marker delay dynamics. A,B) For a given set of parameters θ, η, the differentiation probability distribution Φ (gray, upper panel) and the marker delay probability distribution Ψ (gray, lower panel) are shown. We simulate 50 trees (25 shown in B) from these parameters and apply the tree inference algorithm to obtain estimatesθ,η and distributionsΦ,Ψ (solid lines), which match the true distributions. C) Histogram of the difference in time between predicted and true differentiation timepoints for an independent test set of 100 trees simulated from the true distributions (gray) in A.   Figure 11 and main text) and a different set of parameters. A) The model gives rise to a similar attractor landscape as before (compare to Supplementary Figure 11B). However, overall protein numbers are increased. B) Estimated differentiation and marker delay probability distributions from 100 simulated observed trees. C) Trajectories of predicted cells are centered on the timepoint of lineage choice (t = 0). D) Trajectories of cells whose progeny differentiates into both fates (see inset). Parameters used for simulation are γ X = γ Y = 0.29 h −1 (degradation rate), α X = α Y = 1440 h −1 (maximal synthesis rate), n = 2 (cooperativity), and K X = K Y = 1000 (dissociation constants), x * = 4500 (detection threshold).

Model
We propose a simple model of lineage choice and delayed readout (see main text, Materials and Methods).

Differentiation process
Assume that a cell is born at time t 0 and divides at time t 1 and that the differentiation rate is λ(t). What is the probability that this cell does not differentiate within its lifetime [t 0 , t 1 ]? In general, the number of events k in [t 0 , t 1 ] follows a Poisson distribution We are interested in the case of no event occurring, hence: Similarly, the probability Φ(t) of that cell to differentiate exactly at time t ∈ [t 0 , t 1 ] is the product of the probability to not differentiate up to time t and the instantaneous differentiation probability λ(t), i.e.

Marker protein dynamics
The marker protein dynamics of the simplified gene expression system is modeled by the two reactions (see section 2.1.2 in the main paper) where the first reaction creates new proteins with rate α and the second reaction degrades proteins with rate γ. The Chemical Master Equation describes how the distribution of protein numbers x evolves over time [2]: This is a linear differential equation and we can also write it in matrix form: where we choose an ordering of the vector P(t), so that the i-th component corresponds to P(i − 1, t), e.g. P 1 (t) is the probability to have 0 proteins at time t. According to this ordering, the matrix A is defined as: We are only interested in the dynamics of the system until the protein numbers exceed the detection threshold x * . We apply the finite state projection [3] to the master equation, truncating the statespace at state x * − 1 and introducing the absorbing state x * (see Supplementary Figure 5). This is readily achieved by modifying the matrix A such that or, equivalently, by extending Eq. 2 with separate equations for the states x * and x * − 1: Hence, all probability that leaves the truncated statespace will be collected in the absorbing state x * . The first passage time distribution Ψ x0 (t), that is, the probability that the protein number crosses the threshold x * for the first time at time t starting in state x 0 equals the probability flow from state x * − 1 to x * at time t [4]: To obtain Ψ x0 (t) we have to solve the Chemical Master Equation (Eq. 2) up to time t with initial condition P(x, 0) = δ x,x0 . For this simple stochastic model, an analytic expression for the first passage time distribution can be derived in terms of a renewal equation (see supplement of [5]). However, when dealing with tree structures (see Methods section of the main paper), we have to solve the master equation numerically using standard ODE solvers to also obtain the propagator P x→x (t), that is the probability to start at state x and after time t arrive at state x , which is not analytically available for a truncated statespace.

Derivation of the model likelihood
Our goal is to estimate the parameters (λ(t), α, γ, x * ) of the model (Eq. 1-3) from observed lineage trees in order to predict the lineage choice in a given tree. To that end, we derive the likelihood L of an observed tree T given the parameters (λ(t), α, γ, x * ), which is then optimized to find the maximum likelihood estimates. The likelihood of the observed tree T given parameters (λ(t), α, γ, x * ) is the sum of likelihoods of the hidden trees H ∈ H(T ), because these are competing alternatives: Here, we derive the likelihood L(H|λ(t), α, γ, x * ) of a single hidden tree H. We partition the hidden tree into the subtrees D i induced by the differentiating cells, and a single tree U that contains only undifferentiated cells (see Fig. 2D of the main paper). Due to the Markov property, the likelihood L(H|λ(t), α, γ, x * ) factorizes: Note that λ(t) also appears in the likelihoods for D i as the root of these subtrees is still undifferentiated for some unknown time (Fig. 2D of the main paper). The first term is readily computed Strasser et al. - Supplementary Information  16 as the process generating it has no memory and therefore: where c is a cell within U that is born at time t The terms L(D i |λ(t), α, γ, x * ) in Eq. 5 are more difficult to obtain, as the delay process has memory and hence the individual cells of the subtree cannot be treated independently. Also, one has to account for the unknown time interval where the root of the subtree is still undifferentiated. However, these terms can be calculated efficiently using factor graphs (see section 2.2) Putting together Eq. 4 -6, we find the likelihood of an observed tree T to be: The sum over H in Eq. 8 consists of a large number of terms (it is double exponential in the number of cells [6]), hence an explicit summation is prohibitive for larger trees. However the sum can efficiently be evaluated using dynamic programming (see section 2.3).

Factor graph representation of
To obtain the likelihood L(D i |λ(t), α, γ, x * ), we represent each tree D i as a factor graph (Supplementary Figure 6) and perform inference via message passing [7]. For each non-root cell c in D i , we create two nodes, one representing the state X c ∈ {0, 1, . . . , x * } of the cell c at its first timepoint, the other representing its state Y c ∈ {0, 1, . . . , x * } before division (Supplementary Figure 6). One additional node τ c per cell represents the lifetime of that cell (black circles in Supplementary Figure 6). These three nodes associated with cell c are linked via the factor f c (squares in Supplementary Figure 6), which expresses the probability to reach state Y c in time τ c starting from state X c . The factor f c is thus the propagator of the associated Markov process: f c (X c , Y c , τ c ) = P Xc→Yc (τ c ). It is obtained by solving the Chemical Master Equation (Eq. 2) numerically. Individual cells are linked via cell division factors g c (diamonds in Supplementary Figure 6) that couple the last state of the mother (Y c ) to the first states (X d1 , X d2 ) of the two daughters d 1 , d 2 . For simplicity, we neglect partitioning of molecules at cell division and assume that this factor is the identity: g c (Y c , X d1 , X d2 ) = δ Yc,X d 1 · δ Yc,X d 2 . This ensures that both subtrees inherit the same state from the mother cell. This can be extended by binomial partitioning of molecules [8], e.g. via defining the factor g m relating the number of proteins in the mother cell m and its two daughter cells d 1 , d 2 as Finally, the root cell r is represented by node Y r , representing the state before division and is coupled to a factor f r implementing the integration over the unknown timepoint of differentiation within r: The first term in the integral gives the probability that the differentiation occurred at time t and the second term is the probability to reach protein number Y r in the remaining cell cycle starting from zero proteins. In this factor graph, which compactly describes the joint distribution over all variables P (X 1 , . . . , X n , Y 1 , . . . , Y n , τ 1 , . . . , τ n ) , certain nodes are observed (filled nodes in Supplementary Figure 6): We observe the cell cycle length τ i of all cells and we also know the protein number of the leave cells l, because we observe the marker in these cells: i.e. one hast to integrate the joint distribution over all unobserved nodes. The evidence is calculated with the sum-product algorithm on this factor graph [7] (see section 2.2.1).

Message Passing
To evaluate the likelihood L(D i |λ(t), α, γ, x * ) of the subtrees D i , we perform message passing on the factor graph. Using the notation of [7], we define messages µ X→f going from node X to factor f as The factors l are all factors connected to node X except factor f (ne(x) denotes neighbors in the graph). Messages going from a factor f to a node X are defined as where x 1 , . . . , x M are nodes connected to factor f in addition to x and Y denotes all nodes that are connected to factor f except X. For a detailed derivation of these expressions, we refer the interested reader to Chapter 8.4 of [7]. In the following, we give a brief example of the message passing performed on the graphical model shown in Supplementary Figure 6A, where marker onset is observed in cells 4 and 5. Note that for readability, we ignore the lifetime nodes (black circles in Supplementary Figure 6B) in the message passing scheme and simply treat them as constants in the associated factors. Also we omit the dependence of the factors on the parameters (λ(t), α, γ, x * ).
• Next, we send a message from f 4 to X 4 using Eq. 11: where in the last line, we used the definition of the factor f 4 as the propagator of the Markov process. The same procedure is applied to obtain the message µ f5→X5 (x 5 ).
• In the next step, messages µ X4→g2 (x 4 ) and µ X5→g2 (x 5 ) are sent to the cell division factor g 2 . From the definition in Eq. 10, we see that the nodes X 4 , X 5 simply pass through the incoming messages unchanged: • Again applying Eq. 11, we find that the cell division factor g 2 sends to Y 2 the message µ g2→Y2 (y 2 ) = x4 x5 Here, we used the assumption of perfect state inheritance from mother to daughter as described before.
• Finally, a message has to be sent from the integration factor f 2 towards Y 2 : • To obtain the likelihood of the whole subtree, we have to multiply all messages arriving at Y 2 and sum over all states y 2 : L(D 2 ) = y2 µ f2→Y2 (y 2 ) · µ g2→Y2 (y 2 ) .

Resolving the combinatoric complexity
In section 2.1, Eq. 8, we derived the likelihood of one observed tree T given the model parameters (λ(t), α, γ, x * ) as the sum over all hidden trees H(T ): The sum over H(T ) contains a potentially large number of terms (the size of H(T ) it is doubly exponential in the number of generations [6], or equivalently, it is exponential in the number of cells). For a full binary tree with n generations, the number of hidden trees is equivalent to the number y n of strongly binary trees (rooted trees whose nodes have either zero or two children) with generations ≤ n. This number is defined by the quadratic map y n = y 2 n−1 + 1 with y 0 = 1. For example, for a tree with five generations this evaluates to y 5 = 458330 hidden trees.
Hence, carrying out the sum explicitly is prohibitive for large trees. We thus devised a dynamic programming approach that avoids the explicit enumeration of hidden trees H. First we introduce two convenient abbreviations: is the probability of the subtree rooted in cell i, which we obtain by inference on the factor graph (see section 2.2.1). Furthermore, we definẽ L(c|λ(t)) .

Strasser et al. -Supplementary Information
Algorithm 1: Recursive algorithm to calculate LA(i). We assume an arbitrary but fixed ordering of the tree, i.e. we can rely on the definition of a "left" child cell. // continue recursion in mother cell end Here, LA(i) is defined via Algorithm 1 and denotes a particular set of undifferentiated ancestor cells, such that every undifferentiated cell in the tree is contained in only one set LA(i). L(c|λ(t)) denotes the probability of cell c being undifferentiated (see Eq. 7). Now we define κ(i), which aids us in calculating the value of the overall sum in Eq. 8: where L denotes the set of leaves of T and v, w are daughters of cell i. If cell i is a leave, the probability is just the product of the subtree probability and the probability to be still undifferentiated (corrected for the multiple counting, henceP ). If cell i is not a leave, its contribution to the combinatorics is the following: Either it differentiates and generates the subtree below (first term) or it just leads to the combinations of the two subtrees (starting at cells v, w) below (resulting in all possible combinations of the events in the subtree). Finally, we're only interested in and we can use the recursive rule Eq. 12 to calculate the likelihood of an observed tree (Eq. 8) efficiently.

Finding the most likely hidden tree
Given an observed tree T and a set of parameters (λ(t), α, γ, x * ) the most likely hidden treeĤ can be determined according to Eq. 5 of the main text: This can be done efficiently without explicitly enumerating the entire set H(T ), but using the quantities κ(i), S(i) defined in section 2.3. The key idea is that κ(i) represents an upper bound on the contribution of the subtree rooted in i (where cell i or any set of descendants of i differentiates) to the likelihood. If the currently best solution H * already has larger likelihood than κ(i), all solutions with differentiation events downstream of cell i can be removed from the search space. Pseudocode is shown in Algorithm 2, which yieldŝ where r is the root of T . A similar algorithm can be used to obtain the n-most likely trees H ∈ H and their corresponding likelihoods L(H|λ(t), α, γ, x * ). In Supplementary Figure.   if c ∈ Leaves(T) ; // base case then if S(c) < p * then // worse than current best solution if p l · p r < S(c) then // combining the subtrees is worse than taking c return C = {c}, p = S(c) else // combine the subtrees return C = c l ∪ c r , p = p l · p r end end end 2.5 Accounting for lineage choice before movie start Due to sorting impurity, it is possible that cells entering the time-lapse microscopy experiment at t 0 are already committed towards one or the other lineage. While mature cells that already express lineage markers at t 0 can easily be removed (these cells are usually not even tracked), we still must account for cells that made their lineage choice at some time t < t 0 but not show the lineage markers due to delay. It is straightforward to include this into our model by modifying the likelihood function Eq. 4 as: where π ∈ [0, 1] is the fraction of genealogies that decide their lineage before t 0 and H 0 is the hidden tree corresponding to an already differentiated cell entering the experiment. π is estimated with the other parameters of our model λ(t), α, γ, x * . To calculate L(H 0 |λ(t), α, γ, x * ), notice that there is no need to decompose the hidden tree into undifferentiated and differentiated subtrees. One can directly translate H 0 into a factor graph as described in section 2.2. One only has to modify the expression of the integration factor f r (compare to Eq. 9): where P (x) = 1 x * is a uniform prior distribution over the protein expression state when the cell enters the experiment. Here, we account for the fact that a cell might enter the experiment at

Supplementary Note 1: Inference on synthetic data
We first test our method on synthetic data, choosing parameters θ and η, which give rise to a particular differentiation and delay distribution (see Supplementary Figure 8A, gray). We generate 50 lineage trees according to those distributions, where we do not observe the timepoint of lineage choice, but only the marker onset (Supplementary Figure 8B). By solving the optimization problem in Eq. 4 for our dataset, we obtain a maximum likelihood estimate (θ,η) and compute the corresponding differentiation and delay distributions (Φ,Ψ in Supplementary Figure 8A, solid lines), which closely match the data generating distributions(Φ, Ψ, grey). Furthermore, we predict the most likely hidden trees via Eq. 5 and calculated the distance in time between true and predicted timepoints of lineage choice (Supplementary Figure 8C). The difference in predicted and true timepoint of decision is 1.35 h on average. Thus, our method can accurately recover the underlying parameters and faithfully predict the true timepoint of lineage choice just from observed marker correlations. We now test the predictions of our model trained with the 50 genealogies on a independent test set of 100 genealogies. For each of the 100 trees in the test set, we obtain its most likely hidden tree via Eq. 5, thereby predicting the timepoint of lineage choice. Comparing these predictions to the ground truth, we find that for 77 of 100 observed trees, we predict the correct hidden tree. In terms of single cells, we correctly recall 187 of 229 differentiating cells, while 27 cells are falsely identified as differentiating (for the confusion matrix, see Supplementary Table 1).
To systematically validate the algorithm's performance, we repeat the above analysis for a set of 100 parameters (θ, η) i , (i = 1, . . . , 100), randomly sampled from the intervals given in Supplementary Table 2. We simulate 50 trees T i for each (θ, η) i and perform the maximum likelihood estimation using Eq. 4 (main text). For each set of trees T i , we obtain estimates of the underlying differentiation and delay distributions (Φ i andΨ i ). To measure the agreement between true and estimated distributions (p andp'), we calculate their L 1 -distance as: A histogram of the L 1 -distances between estimated and true distributions is shown in Supplementary Figure 9. The distances in the differentiation distributions are generally small (Quantile 0.95 = 0.25, Supplementary Figure 9A). For the delay distributions these distances are sometimes larger (Quantile 0.95 = 0.70, Supplementary Figure 9B), suggesting that it is more difficult to extract these parameters from the data. Overall, the analysis suggests that we can reconstruct the underlying parameter from the observation with good accuracy.  Until now we have used a simple gene expression model combined with a detection limit of marker onset to explain correlations in trees. Here, we assess if our simple model can cope with a more realistic delay process consisting of a cascade of three genes (Supplementary Figure 10A): Upon lineage choice, expression of the first gene in the cascade (gene A in Supplementary Figure 10A) is triggered, which in turn activates expression of its downstream target (gene B in Supplementary Figure 10A). Gene B in turn activates gene C, whose expression can be detected once crossing a detection limit x * (as in Supplementary Figure 10B). Activation is governed by a Hill function, such that the production rateα B (A) of gene B is an increasing function of the number of activator molecules: with maximal synthesis rates α A = α C = 100 h −1 , α B = 22 h −1 , cooperativity n = 5 and dissociation constants K A = 800, K B = 100. Degradation rates are set to The dynamics of this stochastic process leads to a long and heterogeneous delay ranging from 35 to 60 hours after lineage choice (see Supplementary Figure 10D). We now simulate 50 genealogies from a time-dependent differentiation process (parameters as in Supplementary Figure 8, main text) and the three-gene cascade (a sample of nine genealogies is shown in Supplementary Figure 10C). As before, we fit the model to the data via Eq. 4 (main text). Note that the model still assumes a single gene delay process. Comparing the estimated to the true distributions, we observe good agreement (Supplementary Figure 10D). We note a slight deviation for the delay distribution, which arises due the difference in delay processes (a single gene as opposed to a cascade).
We evaluate the performance of the fitted model on an independent test set of 100 genealogies. For 91 of 100 genealogies, the most likely hidden tree (obtained via Eq. 5, main text) indeed corresponds to the true underlying lineage choice scenario. Performance in terms of single cell prediction is summarized in Supplementary Table 3. Note that due to the long delay many nondifferentiating cells are present, which are mostly classified correctly. In terms of time difference between predicted and actual timepoint of lineage choice, we find that the predicted timepoint is always within 3 hours of the true timepoint (Supplementary Figure 10E) and the misclassifications in Supplementary Table 3 happen close to cell division (where e.g the mother cell might differentiate at the end of its cell cycle, but the methods predicts that its daughter cells difference at the beginning of their cell cycles).   (main text), we predict the differentiating cells and the timepoint of differentiation within those cells. The timecourses of the two proteins X and Y in these predicted cells are then aligned at the predicted differentiation timepoint (t d = 0 on the x-axis of Fig. 4E, main text). This timepoint coincides with the point where trajectories diverge, i.e. the method detects the timepoint where the toggle switch tilts into one or the other direction, corresponding to the lineage choice. To further quantify this, we fit linear models to the timecourse of each single cell, group cells accordingly into generations before, at, and after the predicted differentiation and assess the protein production (slope of the linear fit) of X and Y in these cells (Fig. 4F,G, main text). While the production is zero and protein levels stay constant in generations before the predicted differentiation decision, the production clearly changes in cells that are predicted to differentiate (Fig. 4F,G, main text). For generations after the predicted differentiation decision and at the observed marker onset, production plateaus and decreases as the toggle switch approaches one or the other differentiated state.
We performed similar analyses for a different parameter set (see Supplementary Figure 12) and for a toggle switch coupled to a three-gene marker cascade (see Supplementary Figure 13) and obtained the same result: In the predicted cells the balance of the two toggle switch proteins is broken and the system tilts towards one or the other differentiated state. Thus, our method is able to reconstruct the timepoint of lineage choice even if the decision model is more complex and implemented via a genetic toggle switch.

Supplementary Note 4: Robustness with respect to experimental data
To test whether the results from the hematopoiesis dataset are robust across the n = 3 experiments, we subsampled the dataset, leaving out an entire experiment one at a time. For the resulting three datasets, we performed inference with our model by solving Eq. 4 (main text). The estimated distributions Φ, Ψ are similar across these three runs (Supplementary Figure 14).
Due to the similarly estimated distributions, also the cells predicted to differentiate are mostly the same across the three runs. Most importantly, the slopes of PU.1 show the same behaviour around the predicted time of lineage choice (Supplementary Figure 15).
Predictions are very stable for small perturbation on the order of a few hours (only 3% and 14% of predicted most likely hidden trees change in the GM and MegE datasets at σ = 5h, respectively) and the timing of the predicted lineage choice changes only little, i.e. moving the prediction from the end of a mother to the beginning of a daughter cell. This shows that exact annotation of onsets is not essential. For larger perturbation on the order of one cell cycle (12h), predictions are affected more strongly: the unperturbed hidden tree is less likely to be rediscovered and the time difference between perturbed and unperturbed prediction further increases. Note however that the predictions are still rather stable for the GM-genealogies, where large noise (σ = 21h) shifts the predicted lineage choice by less then 10h. MegE-genealogies are more affected than GMgenealogies. Here, a prediction is often based on only a few onsets and hence are more sensitive to perturbations, whereas in GM-genealogies many onsets and their intricate correlation structure inform a prediction, essentially averaging out the onset perturbations.

Supplementary Note 6: Analysis of the delay-induced correlations
The prediction of lineage choice relies on the correlated marker onsets induced by the stochastic marker delay process (Eq. 2). Naturally, the question arises: When are e.g. two marker onsets considered to be correlated due to the delay and hence downstream of the same lineage choice? To analyze the type of correlations induced by a given delay process, the following quantities are of importance (see Supplementary Figure 17A): i) The time τ from the lineage choice to the last common ancestor of the two onset cells. This is the shared history of the two onset cells. ii) The time from the last common ancestor to the respective onsets in the two cells (t A , t B ). This is the time where the cells evolve independently of each other, starting from a perfectly correlated state at the common ancestor, to increasingly decorrelated states over time. Note that several division can happen between the common ancestor and the onsets. iii) The time difference in the two onsets ∆ = |t A − t B |.
These quantities are related via the following probability distribution: where x is the state of the delay process, P (x|τ ) = P 0→x (τ ) is the distribution of the delay process state right before the two cells split and P (t A |x) = Ψ x (t A ) is the first passage time distribution of the delay process (similarly P (t B |x) = Ψ x (t B )).
In the following, we will take the delay process estimated for the GM-lineage in the main paper (Fig. 3D) and analyze two aspects of Eq. 16 to gain insight about the induced correlations. For small τ , t A and t B are almost independent as the delay process will still be in its initial state (Supplementary Figure 17B, τ = 0h). With increasing τ the delay process advances, coupling the two marker onset cells via their shared delay state x(τ ) and inducing correlations in the variables t A , t B (Supplementary Figure 17B, τ = 24h). The "pressure" for a marker onset increases (moving into the tail of the marker onset distribution) and the mean of the distribution moves towards the origin (Supplementary Figure 17B, τ = 36h). Note that the process induces only weak Pearson correlations (max(ρ) = 0.35 in Supplementary Figure 17B) of onset timing. Only conditioning on the last common ancestor allows a large variety of onset patterns, i.e. for τ = 24h, one onset might occur two generations later (t A = 24h) and its paired onset might happen only four generations later (t B = 48h).
An onset shortly after cell division of the last common ancestor (i.e. small t A or t B ) should be highly informative about the delay state in the last common ancestor, i.e. the process must be close to the state x * . The other sister cell would inherit that state, and hence also should become marker positive soon (small ∆). Hence, we ask: What is the distribution of time difference ∆ conditioned on an onset in the sister cell at t A , i.e. the distribution P (∆|t A , τ ). We estimate P (∆|t A , τ ) by sampling from Eq. 16. For small τ cells will be almost independent and no onsets will happen early on. However, for τ ≥ 24h, we observe that onsets shortly after division (small t A ) indeed produce smaller onset differences (small ∆). For example, with τ = 48h, an early onset in one sister (t A < 10h) will very likely produce an onset in the other sister within a 10h time window (white/yellow area in Fig. 17C, τ = 48h). However, the long tails of P (∆|t A , τ ), as measured by the 90% quantiles (white lines in Supplementary Figure 17C) show that a large variety of onset pairs would be consistent with this delay process (every onset pair that lies below the white line). Due to these weak pair correlations one cannot simply "read off" the timepoints of lineage choice from the observed genealogies.
How can we still make reasonable predictions about the timepoint of lineage choice? Here, it is key to consider not only single pairs of onsets. Any single pair of onsets in a genealogy might be consistent with the delay process (e.g. falling into the 90% quantiles in Supplementary  Figure 17B,C) and hence indicate that the two onsets derive from the same lineage choice. If we consider multiple onset pairs downstream of the same putative lineage choice and they all barely fall within the distribution, that will already indicate that the putative lineage choice is likely inconsistent with the delay process (even though each individual pair is consistent with it). Even more, one would consider not only pairs, but triplets, quadruplets, etc. to see if they are consistent with the delay process. By taking into account the entire correlation structure of onsets in a single genealogy at once, our method can reliably infer the lineage choice even in the presence of weak correlations.

Supplementary Note 7: Analysis of GMMegE genealogies
Genealogies that give rise to both GM and MegE cells are rare in the dataset of [1] (17 across three experiments) due to the tracking strategy and only scarcely tracked. Hence they were not used in estimating the model parameters from the tracked genealogies. However, they provide a test case for our prediction of lineage choice: The common ancestors of a GM-cell and a MegE cell must have been bipotent and hence lineage choice can only occur below these common ancestors.  Figure 18F) indicates lineage choice as late as generation 6. If instead lineage choice would predominantly occur late, one would not see the largely homogeneous (in terms of GM vs MegE) subtrees.