Fate mapping of hematopoietic stem cells reveals two pathways of native thrombopoiesis

Hematopoietic stem cells (HSCs) produce highly diverse cell lineages. Here, we chart native lineage pathways emanating from HSCs and define their physiological regulation by computationally integrating experimental approaches for fate mapping, mitotic tracking, and single-cell RNA sequencing. We find that lineages begin to split when cells leave the tip HSC population, marked by high Sca-1 and CD201 expression. Downstream, HSCs either retain high Sca-1 expression and the ability to generate lymphocytes, or irreversibly reduce Sca-1 level and enter into erythro-myelopoiesis or thrombopoiesis. Thrombopoiesis is the sum of two pathways that make comparable contributions in steady state, a long route via multipotent progenitors and CD48hi megakaryocyte progenitors (MkPs), and a short route from HSCs to developmentally distinct CD48−/lo MkPs. Enhanced thrombopoietin signaling differentially accelerates the short pathway, enabling a rapid response to increasing demand. In sum, we provide a blueprint for mapping physiological differentiation fluxes from HSCs and decipher two functionally distinct pathways of native thrombopoiesis.

Representative gating of RFP + bone marrow populations (right).
c, All possible topologies of lineage pathways connecting S to P1 and P2, and further to M1 through M3. For simplicity, convergence (P1 and P2 spawning the same mature cell type) has not been considered.    k, Inferred cell fluxes in young (black) and old (red) mice for all differentiation steps.
l, Inferred cell numbers in HSPCs in young (black) and old (red) mice.
In k and l, dots are for best-fit parameters and error bars indicate 95% prediction bands.          The mathematical balance equations for the cell numbers are based on the principle that the flux of cells-the number of cells undergoing a certain process (e.g., cell division) per unit time-is proportional to the number of cells that can potentially undergo this process. The proportionality factor is a rate constant that is the inverse of the average waiting time before a cell commits to the process in question (e.g., the average cell cycle time in the case of cell division). Hence, the balance equations are exact in principle. Their precision in describing the actual biology depends on the appropriate definition of cell populations and of the waiting time distributions for the processes described. Here, we use the customary exponential waiting time distributions and base the model on a refined definition of the most upstream hematopoietic populations, HSCs, MPPs and HPC-1, that includes, in addition to the usual markers (LSK, CD48 and CD150) also CD201 (EPCR) for HSCs and Sca-1 surface level for HSCs, MPPs and HPC-1. Thus we go well beyond the resolution of previous studies 2,3 .

5-FU
Specifically, HSCs and progenitors can undergo self-renewal (with rate ff), divide asymmetrically (with rate ‚), divide into two more differentiated daughter cells (with rate ȷ, symmetric differentiating division), differentiate directly (with rate -) or die (with rate ‹) (Supplementary Fig. 4g). Based on these fundamental processes, the population dynamics for a lineage pathway where each product has exactly one precursor are described by the following system of ordinary differential equations: where population 1 are the tip stem cells (ES HSCs) and populations 2; 3; : : : are successive downstream populations. Generalization to pathways where two or more precursors feed into the same product is straightforward. These equations can be rewritten in a simpler form: using two aggregate parameters:¸i where¸i and » i are the total differentiation rate and net cell loss rate of compartment i, respectively.
Based on this generic model, in the following sections we develop a quantitative framework of inferring differentiation pathways from the combination of HSC fate-mapping and cell proliferation data.

Propagation of fate-mapping label
Let m i and n i be the labeled and total cell number of population i. The labeling frequency is defined by f i ≡ m i n i . The dynamics of the labeling frequency is governed by, It is reasonable to assume that heritable labeling (e.g., via RFP expression from the Rosa26 locus) does not alter cell proliferation, differentiation and death rates. Hence we combine Equation 4 with Equation 2 and obtainḟ where for the tip stem cells f 1 = constant, as shown experimentally (Fig. 1d). The new variable r i denotes the size ratio between neighboring compartments: r i ≡ n i n i−1 4 . The generic model can be readily extended to describe more complex lineage trees, e.g. with diverging and converging branches.

Analysis of heritable label propagation between HSC subpopulations
Functionally different subsets may exist in a phenotypically defined cell population such as HSC. Transitions among these subpopulations may be irreversible or reversible. We now show that reversibility would shape the dynamics of fate-mapping label in a manner that would be easily recognizable in experimental data. To characterize the kinetic behavior of the label propagation in this situation, we consider the simplest case in which two subpopulations A and B give rise to to each other, with rates¸A B (for A → B) and¸B A (for B → A). The cell number kinetics of A and B are governed bẏ where » A and » B are the net cell loss rates of A and B, respectively. Based on Equation 4, the label propagation equations can be derived,ḟ where r BA ≡ n B n A . To obtain a steady state for Eqs. 6, the cell loss rates of both subpopulations must be greater than zero, i.e., » A > 0 and » B > 0. Combining Equation 6 and 7 under steady-state condition, the equation for label propagation becomes,ḟ The analytical solution of this equation is, where f A0 and f B0 are the initial (at t = 0) labeling frequencies of A and B, respectively.
The solution shows that the label frequencies of subpopulations A and B will eventually equilibrate at an intermediate value between their initial label frequencies (the first term in Equation 9), which is the average of their initial labeling weighted by the net cell loss rates (» A and » B ). As a consequence, the two subpopulations reach the labeling equilibrium in opposite manners. Specifically, if A is preferentially labeled initially, i.e., f A0 > f B0 , then the label frequency of B will increase with time, whereas that of A will decline. By contrast, irreversible transition, e.g., A → B, implies that A will maintain a constant labeling (its initial value) over time, and B will increase its label frequency and finally equilibrate with that of A. This feature is governed by the following equation (setting » A to be zero in Equation 9), Taken together, reversible and irreversible transition schemes among cell subpopulations can be well distinguished by the directions of label frequencies approaching equilibrium. If subpopulations are initially labeled differently, then a reversible transition between them causes a decrease of labeling frequency in the initially more strongly labeled one. Absence of this feature, as in our data on HSC subpopulations, rules out reversibility.

Modeling H2B-GFP dynamics
To infer cell proliferation rates from the H2B-GFP dilution data, we model the temporal changes of the mean H2B-GFP intensity (total H2B-GFP intensity divided by cell number) in each cell population, which can be readily obtained from experimental data, i.e., taking the mean value of the H2B-GFP distribution followed by proper scaling for the convenience of numerical computation (mean value of the raw data divided by 1000 in this work). The cell number dynamics is given by Equation 2, so next we derive the equations for total H2B-GFP intensity in each compartment.
We assume that H2B-GFP turnover and basal production change the fluorescence intensity of all the cell populations uniformly, whereas cell proliferation, differentiation and death do so in a population-specific manner. Let X i (t) be the total H2B-GFP intensity of compartment i at time t. During time increment ∆t, H2B-GFP turnover and basal production cause the change in total intensity by, where n i (t) is the cell number of compartment i, v the basal H2B-GFP production rate per cell and k d the turnover rate.
Symmetric self-renewal redistributes existing H2B-GFP into more cells in the same population, thus altering the relative (i.e., per cell) but not the total H2B-GFP intensity of the population. By contrast, cell death or differentiation into the next population, i.e., i → i + 1, cause H2B-GFP loss from population i. The intensity loss equals the number of cells that leave the compartment times the H2B-GFP intensity taken away per cell. For example, the cell number loss after ∆t is ‚ i n i (t)∆t due to asymmetric differentiation.
The H2B-GFP level loss per cell is half of the current level, i.e., Thus, the total H2B-GFP loss after ∆t for compartment i is, Furthermore, each population, except for the tip stem cells, receives H2B-GFP from its direct upstream population via differentiation. In a similar way, we formulate this effect as, Therefore, the total H2B-GFP level of compartment i at time t + ∆t is, The dynamics of the mean H2B-GFP intensity x i (t) ≡ X i (t) n i (t) obeys, Combining this with Equation 1, we obtaiṅ This equation shows how each elementary process contributes to the H2B-GFP dynamics. First, cell death, symmetric and direct differentiation of a given population i do not affect its own mean H2B-GFP level (none of the parameters ‹ i , ȷ i ori appear in the equation). Second, H2B-GFP intensity is diluted by symmetric self-renewal and asymmetric differentiation. Third, cells entering via differentiation from the upstream population can either increase or reduce the mean H2B-GFP level, depending on the intensity difference. Finally, H2B-GFP dilution is also determined by its own degradation (with k d ). For all parameter inferences in this paper, we fixed k d to its previously determined value in murine HSCs, corresponding to a half-life of 42 days 5 , thus improving inference of the relevant rate constants of HSC and progenitor proliferation, differentiation and death. We rewrite Equation 11 using the aggregate parameters defined in Equation 3 and introduce another aggregate parameteri , the total self-renewal rate In summary, we obtain a system of differential equations for cell numbers (Equation 2), frequencies of fate-mapping label (Equation 5) and average H2B-GFP intensities (Equation 12) in hematopoietic cell populations. The key parameters characterizing the dynamics of cell proliferation and differentiation-total differentiation rates¸i , direct differentiation ratesi , net loss rates » i and total self-renewal ratesi -can all be time-dependent (e.g., change with increasing age of the animal).

Demonstrating inference of lineage pathways
Label propagation and H2B-GFP dilution experiments yield, respectively, differentiation and proliferation rates of stem and progenitor cells. We therefore reasoned that the combination of the two types of experimental data will allow inference of both differentiation and proliferation rates in a complex lineage tree. We have advanced this argument recently for the combination of HSC fate mapping and BrdU incorporation (instead of H2B-GFP dilution) 4 . The principal new aspect in this paper is that we show that combining these types of experimental data also yields information on the topology of the underlying lineage tree. To show this, we first developed the idea with a toy model ( Supplementary Fig. 4). To this end, we simulated data for a known lineage topology, and then asked whether model selection based on these in silico data with a broader set of possible lineage topologies indeed identifies the correct topology (ground truth). We collect the equations from Equation 5 and 17, The value of total net cell loss » i can be positive or negative. To be more practical, we substitute » i with other naturally positive parameters, loss rate (in the case of non-proliferating mature cells, the death rate will equal the net loss rate). The model selection successfully discovered the true model ( Supplementary Fig. 4d and e). In stark contrast, the true model could not be singled out by inference without the proliferation data ( Supplementary  Fig. 4f). To conclude, combining fate-mapping and cell proliferation data allows both identification of lineage topology and quantification the cell fluxes in the lineage pathways.
2 Inferring lineage pathways of native hematopoiesis from experimental data

Modeling lineage pathways of hematopoietic stem and progenitor cells
To address the experimental data on HSC fate mapping, augmented by tracking of mitotic history of HSCs and progenitors via H2B-GFP, we devised a family of models, with each HSC and progenitor population divided into Sca-1 high and Sca-1 low subpopulations ( Supplementary Fig. 4h). HSC subpopulations are further divided by CD201 level. The CD201 -/lo Sca-1 lo subpopulation has very few cells, so it is not considered in the models. To identify the differentiation pathways, we performed statistical model selection against the experimental data. The equations for the full model with all possible differentiation and Sca-1 transition steps read: where The subscript i in the model variables and parameters ranges from 1 to 4, representing cell compartments CD201 hi HSC, CD201 -/lo HSC, MPP and HPC-1, respectively. In addition, the subscripts ih and il stand for compartment i Sca-1 high and Sca-1 low, respectively. The variable r i is the compartment size ratio between two neighboring Scal-1 hi compartments, i.e., r i = n ih =n i−1h ; i = 2; 3; 4. s i is the size ratio between Sca-1 lo and Sca-1 hi sub-compartments at each level, i.e., s i = n il =n ih ; i = 2; 3; 4. The parameter ffi is the transition rate between Sca-1 hi and Sca-1 lo subpopulations. Detailed descriptions of all parameters are given in Supplementary Table 1.
The parameter values are inferred in a standard way by minimizing the sum of the weighted squared residuals or the ffl 2 value: where y represents the label frequency (normalized by ES HSC), compartment size and mean H2B-GFP intensity; i indicates different cell compartments, and j the measured time points. ff i;j is the corresponding standard error of the mean. The uncertainty of the parameters is quantified by profile likelihood method implemented by the Data2Dynamic package, where the likelihood is tightly linked to ffl 2 .
Model selection is done with a comprehensive array of sub-models of the most general model, thus explicitly introducing parsimony and allowing model ranking via the Akaike information criterion. The equations for any sub-model are derived from Equation 24 by setting specific parameters to zero. Moreover, we found that the experimental data cannot distinguish between cell differentiation via asymmetric cell division and direct differentiation without division (see also Barile et al., 2020). To be specific, we neglect the contribution of direct differentiation in passing on H2B-GFP label to downstream compartments, compared to label dilution via different types of cell division, and seti in the corresponding equations to zero. This assumption does not compromise the quality of the fit to the data.
We fit all the models to the Fgd5 ZsGreen:CreERT2 label propagation and H2B-GFP dilution data and ranked the models using the bias-corrected Akaike information criterion (AICc). The top ranked models show two common features: first, parallel differentiation via Sca-1 hi and Sca-1 lo paths; second, irreversible transition from Sca-1 hi to Sca-1 lo states. Models bearing one or neither of the two features ranked poorly.

Quantifying the probability of lineage pathways
There is one single pathway for lymphoid lineage, where Sca-1 level is consistently high from tip HSC to HPC-1. On the other hand, myeloid cells can be derived via 6 pathways. To quantify the contributions of these pathways, we calculated pathway probability based on differentiation history. We started at the final cell state z i , and calculated the probability of its ancestral line of cells to have differentiated via a pathway z 0 → z 1 → · · · → z i . To do this, we back-tracked from state z i and denoted the probability to have arrived from the precursor state z i−1 by p(z i−1 → |z i ). Because we consider a Markov state network, the differentiation decision at each state is independent of the differentiation history, such that the pathway probability factorizes: When an expanding cell population reaches a state of steady growth, the elementary transition probability can be written as: where the sums run over all possible precursor states; Λ is the dominant eigenvalue of the dynamic system, that is, the population growth rate; p sg (z i ) is the relative compartment size of z i to the sum of all populations at steady growth; and¸is the differentiation rate. Combining Equations 27 and 28, we calculated the probability of all six paths from tip HSC to CMP at steady growth.

Model inference including mature cell populations
We extend the current model to describe differentiation pathways of mature blood cells, with a focus on platelets. It is known that platelets arise from conventional CMP-MkP pathway. However, the labeling frequency of platelets is higher than that of MPP, which suggests that the conventional pathway is not the only route to produce platelets. We therefore extended the model to allow MkP to directly receive cells from CD201 -/lo Sca-1 lo HSCs, and fit the model to the collective data of HSPC, MkP and platelets. However, the model fails to capture the label propagation data of MkP and platelets, because the measured MkP labeling frequency is still lower than platelets (Fig. 3c). A potential solution to this controversy is to assume two, in the simplest case, heterogeneous subpopulations in MkP, one of which is produced directly from HSC ( Supplementary Fig. 8c). The model of heterogeneous MkP compartment is supported by the data (Supplementary Fig. 8d). CD48 -/lo MkP arises directly from Sca-1 lo HSC, while CD48 hi MkP is produced via the conventional CMP pathway. Moreover, the two-compartment model faithfully captured the data of MkP subpopulations.