Blood formation is believed to occur through stepwise progression of haematopoietic stem cells (HSCs) following a tree-like hierarchy of oligo-, bi- and unipotent progenitors. However, this model is based on the analysis of predefined flow-sorted cell populations. Here we integrated flow cytometric, transcriptomic and functional data at single-cell resolution to quantitatively map early differentiation of human HSCs towards lineage commitment. During homeostasis, individual HSCs gradually acquire lineage biases along multiple directions without passing through discrete hierarchically organized progenitor populations. Instead, unilineage-restricted cells emerge directly from a ‘continuum of low-primed undifferentiated haematopoietic stem and progenitor cells’ (CLOUD-HSPCs). Distinct gene expression modules operate in a combinatorial manner to control stemness, early lineage priming and the subsequent progression into all major branches of haematopoiesis. These data reveal a continuous landscape of human steady-state haematopoiesis downstream of HSCs and provide a basis for the understanding of haematopoietic malignancies.


All mature blood and immune cells are thought to derive from self-renewing and multipotent HSCs. According to the current model, initiation of differentiation is associated with the loss of self-renewal and generation of discrete multipotent, oligopotent and subsequently unipotent progenitor cell stages1,2. These lineage-restricted progenitors are thought to be generated in a stepwise manner by several subsequent binary branching decisions leading to the classical hierarchical tree-like model of haematopoiesis1,2,3,4,5,6. However, this model is mainly based on analyses of fluorescence-activated cell sorting (FACS)-purified cell populations. Even if followed up by single-cell assays3,4,7, such analyses derive average properties of predefined cell populations and thereby miss both quantitative changes within gates as well as transition states falling between often subjectively set gates.

Moreover, the lineage contribution associated with each population is typically determined by assays such as colony formation or transplantation. While these assays read out lineage potential, the actual cell fate during homeostasis in vivo may be different8,9. Depending on the assays and markers used, partly conflicting branching points and hierarchies have been proposed10,11,12,13,14.

Recent studies based on novel single-cell approaches have challenged more fundamental aspects of this classical model. For instance, unipotent progenitors can derive directly from HSCs without proceeding through oligopotent progenitors14,15 and lineage commitment was observed in progenitors proposed to be oligopotent7,10,16. However, many of these studies focused on more differentiated compartments7,10,16 or used predefined subpopulations to investigate single-cell heterogeneity7,17, impeding the characterization of transitions between cell stages. Therefore, it remains unclear how individual HSCs enter lineage commitment during homeostasis in vivo. To establish a comprehensive model of haematopoiesis that can reconcile previous findings, a combined view of transcriptomic and functional changes along the developmental progression of individual cells is required. Here we developed an approach that integrates the reconstruction of developmental trajectories18,19 with the quantitative linkage between transcriptomic and functional single-cell data17 and thus provides a detailed view on lineage commitment of individual haematopoietic stem and progenitor cells (HSPCs) into all major branches of human haematopoiesis.


Healthy human bone marrow cells were labelled with a panel of up to 11 FACS surface markers commonly used to characterize human HSPCs5,6 (see Methods and Supplementary Table 1). All HSPCs, defined by the absence of lineage markers (Supplementary Table 1) and expression of CD34 (LinCD34+), were individually sorted and enriched for immature cells (see Methods). The surface marker fluorescence intensities of all markers were recorded to retrospectively reconstruct immunophenotypes (CD10, CD38, CD45RA, CD90, CD135, and depending on the experiment CD2, CD7, CD49f, CD71, CD130, FCER1A, ITGA5 and KEL, Supplementary Fig. 1a). Such index-sorted HSPCs derived from the bone marrow of two healthy individuals were subjected to RNA-seq analysis (‘index-omics’, 1,034 and 379 single cells; see Supplementary Fig. 1b for the distribution of cells within classically defined gates5,6 and Supplementary Fig. 2 for quality metrics of single-cell RNA-seq) to determine their transcriptomes or individually cultured ex vivo (‘index-culture’, 2,038 single cells) to quantify megakaryocytic, erythroid and myeloid lineage potential. Subsequently, the functional and transcriptomic data sets were integrated by regression models using commonly indexed surface marker expression to identify the molecular and cellular events associated with the differentiation of human HSCs at the single-cell level (Fig. 1). To make this data type accessible, we developed indeXplorer, a web-based platform that combines features of FACS software (for example, custom gating) with tools for single-cell transcriptomics data analysis (for example, differential expression analysis, clustering, principal component analysis) in a single graphical user interface (Supplementary Fig. 3 and http://steinmetzlab.embl.de/shiny/indexplorer/?launch=yes).

Figure 1: Experimental strategy.
Figure 1

Adult human HSPCs were stained with antibodies against up to 11 surface markers and individually sorted for either single-cell RNA-seq or single-cell cultures. Data from the two experiments were then integrated on the basis of surface marker expression to reconstruct developmental trajectories of haematopoiesis.

Early haematopoiesis is a continuous process

HSCs and their immediate progeny, such as multipotent progenitors (MPPs) or multilymphoid progenitors (MLPs), are located in the LinCD34+CD38 compartment, whereas more differentiated progenitors reside in the LinCD34+CD38+ compartment5,7. Global gene expression analysis of single cells within these two compartments revealed fundamentally different transcriptomic structures. In both individuals, the LinCD34+CD38+ progenitors could be separated into clusters corresponding to distinct progenitor cell types of all major branches of haematopoiesis (Fig. 2a and see below). In contrast, clustering within the LinCD34+CD38 compartment was largely unstable, as demonstrated by cluster stability analysis (Supplementary Fig. 4a), the absence of clusters according to Gap statistics (Supplementary Fig. 4b), and a recently published algorithm for the clustering of single cells20 (Supplementary Fig. 4c). A simulated series of random steps from an individual cell to one of its nearest neighbours (see Methods) revealed that the majority of LinCD34+CD38 cells were highly interconnected, contrasting the disconnected cell types from the LinCD34+CD38+ compartment (Fig. 2b). Unsupervised visualization of all individual cells irrespective of FACS markers by t-SNE confirmed that LinCD34+CD38 cells formed a single continuously connected entity. In contrast, LinCD34+CD38+ cells emerged into locally clustered cell populations, with the exception of some phenotypic common myeloid progenitors (CMPs) and CD10+ MLPs, suggesting that the classification based on differential CD38 expression is excellent, but not absolute (Fig. 2c).

Figure 2: A stem and progenitor cell continuum precedes the establishment of discrete lineages at the CD34+CD38+ stage.
Figure 2

(a) Hierarchical clustering of Lin CD34+ CD38 (individual 1: 467 cells, individual 2: 261 cells) and LinCD34+CD38+ (individual1: 567 cells, individual 2: 118 cells) compartments for both individuals. Clustering was performed on the most variable 1,000 genes of each population. The most variable 100 genes are displayed in the heatmap. The asterisk indicates that 3 putative eosinophil/basophil/mast cell progenitor subclusters of <5 cells were merged. Cells labelled G2M showed high expression of genes indicative for G2/M phase of the cell cycle and likely clustered together based on their cell cycle state rather than cell-type-specific gene expression. (b) Random walk analysis of LinCD34+CD38 and LinCD34+CD38+ compartments for both individuals. One hundred random walks, that is, series of random steps from one cell to any of its five nearest neighbours in correlation distance space, were simulated and the number of cells reached was evaluated in relation to the total number of cells. Five-nearest-neighbour networks are depicted on the right. (c) t-SNE visualization of all cells (individual 1) highlighting the degree to which cells are associated with local clusters (left panel, see also Methods) and the immunophenotype (right panel).

Notably, the absence of hierarchical structures in the primitive LinCD34+CD38 compartment was due to the gradual nature of differences between cells in that compartment, and not due to insufficient data quality or a lack of transcriptomic heterogeneity: a principal component analysis of LinCD34+CD38 cells resolved more than 10 distinct, variable biological processes in this compartment, such as cell cycle activation and lineage priming (Supplementary Fig. 4d–f). These processes are tightly correlated to surface marker expression (Supplementary Fig. 4g).

Collectively, these observations are incompatible with the classical model of early haematopoiesis, which assumes a hierarchical tree-like structure of discrete progenitors downstream of HSCs. In contrast, our data suggest that HSCs and their immediate progeny are initially part of a continuum of low-primed undifferentiated (‘CLOUD’)-HSPCs within the LinCD34+CD38 compartment (see also below). Discrete populations are established only when differentiation has progressed to the level of restricted progenitors typically associated with the upregulation of CD38.

Lineage-restriction downstream of the HSPC continuum

To characterize the discrete populations in the LinCD34+CD38+ compartment, we performed gene expression and cell surface marker analyses as well as functional validations at the single-cell level. Our analyses revealed that these populations correspond to lineage-restricted progenitors of all major branches of bone marrow haematopoiesis, including B-cell progenitors of distinct stages, megakaryocyte/erythrocyte committed progenitors (ME, Ery, Mk), neutrophil-primed progenitors (Neutro), monocyte/dendritic cell (Mono/DC) progenitors, and eosinophil/basophil/mast cell progenitors (Eo/Baso/Mast), as well as immature myeloid progenitors (Fig. 3a and Supplementary Table 2). Importantly, populations cluster by cell type and not by individual in a cross-individual comparison (Fig. 3b). The comparison of the surface marker expression of these populations to the commonly applied gating scheme5 using our indexed data set showed that immunophenotypically defined oligopotent progenitor populations (megakaryocyte–erythroid progenitors, MEPs; granulocyte–monocyte progenitors, GMPs; B-cell–NK-cell progenitors, B–NKPs) were mainly comprised of cell types with unilineage-specific gene expression profiles (Fig. 3c) and functional unipotency (Fig. 4a, b).

Figure 3: The LinCD34+CD38+ compartment consists of distinct lineage-restricted progenitors.
Figure 3

(a) Overview of putative cell types in individual 1 (see panel b for a comparison between individuals). Classes obtained from hierarchical clustering of the Lin CD34+ CD38+ compartment (Fig. 2a) were assigned to putative cell types based on analyses of gene and surface marker expression. The asterisk indicates that 3 putative eosinophil/basophil/mast cell progenitor subclusters of <5 cells were merged for this analyses. TFs, transcription factors. (b) Averaged gene expression profiles for cell types from both individuals defined in Fig. 2a were clustered on the basis of the 1,000 most variable genes. Only the most variable 100 genes are shown in the heatmap. (c) Index-omics display of LinCD34+CD38+ progenitors. Sequenced single LinCD34+CD38+ cells were arranged according to their cell surface marker expression in classical FACS gating strategies to identify B- and NK-cell progenitors (B–NKPs), megakaryocytic–erythroid progenitors (MEPs), common myeloid progenitors (CMPs) and granulocyte–monocyte progenitors (GMPs). Cells were colour-coded on the basis of their cell type identity from Fig. 3a.

Figure 4: Characterization of LinCD34+CD38+ lineage-restricted progenitors.
Figure 4

(a) Index-culture display of Lin CD34+ CD38+ CD10 HSPCs. Single HSPCs were cultured for 3 weeks and the resulting colony type was plotted in relation to CD45RA and CD135. (b) Single cells from the ex vivo culture assay were scored as unipotent (gave rise to one lineage) or mixed (gave rise to more than one lineage). (c) Neutrophil-primed subpopulations in relation to CD45RA and CD135 surface marker expression. (d) Megakaryocytic/erythroid-primed subpopulations in relation to TFRC (CD71) mRNA and KEL mRNA expression (left panel) and erythroid colony output in relation to CD71 and KEL surface marker expression (right panel). (e) Pre-B-cell subpopulations from individual 2 in relation to CD10 surface expression and forward scatter (FSC). (f) Prospective isolation of B-cell subpopulations sB and lB using classical flow cytometry. FACS markers for IL7R and CD9 permit the separation of two populations with FSC/CD10 profiles corresponding to sB and lB, as suggested from gene expression data.

Cells within the classic GMP compartment were separated into several neutrophil-primed progenitors (N-0 to N-3), as well as into monocyte/dendritic cell progenitors (Mono/DC). The distinct neutrophil-primed progenitors probably represent progenitors at different developmental stages and granule composition (Fig. 4c and Supplementary Fig. 4h)21,22. Immunophenotypically, all neutrophil-primed progenitors express the surface markers CD135 and CD45RA, which are progressively upregulated during maturation (Fig. 4c). In contrast to neutrophil-primed progenitors, Eo/Baso/Mast progenitors did not fall into the classical GMP gate but displayed a LinCD34+CD38+CD10CD45RACD135mid immunophenotype (Fig. 3c), and expressed transcription factors important for early MEP commitment (GATA2 and TAL1) supporting a recent study suggesting that granulocyte subtypes might derive from distinct haematopoietic lineages12.

The MEP gate consisted of megakaryocytic (Mk) progenitors expressing typical Mk genes, of erythroid-committed (E-1, E-2) progenitors of distinct developmental stages, differing in haemoglobin and GATA1 expression, as well as of subpopulations showing combined expression of megakaryocytic and erythroid genes (M/E). Our single-cell transcriptome data suggested CD71 (TRFC) and the red blood cell antigen KEL to be highly indicative for erythroid fate, which was confirmed by single-cell culture assays using CD71 and KEL as indexing antibodies (Fig. 4d).

For individual 2, two CD10+ B-cell progenitor clusters (small pre-B-cells, sB and large pre-B-cells, lB) were observed. sB was characterized by high CD9 messenger RNA expression, high CD10 surface expression and small cell size (forward scatter (FSC)), whereas lB showed high expression of interleukin-7 receptor (IL7RA) mRNA, intermediate CD10 surface levels, expression of cell-cycle-related genes and large cell size (Fig. 4e and Supplementary Fig. 4i and Supplementary Table 2). This suggests that sB corresponds to small pre-B-cells, and lB to large pre-B-cells, progenitor populations that have been well characterized in the murine system, but to a lesser extent in the human system23. To validate and prospectively isolate large pre-B-cells and small pre-B-cells, we used IL7RA and CD9 FACS markers, which allowed us to recapitulate the levels of CD10 surface expression, cell size and cell cycle activity as predicted from the index-omics data (Fig. 4f and Supplementary Fig. 4j). In contrast to individual 2, in individual 1, only small pre-B-cells were observed (Fig. 3b).

For both individuals, we also observed CD38-positive HSPCs with a gene expression profile of rather immature cells (Im) (Fig. 3a). These clustered globally with the LinCD34+CD38 compartment in t-SNE analyses, and expressed lower levels of CD38 (Supplementary Fig. 4k). Most of these cells displayed an immunophenotype typical for CMPs (LinCD34+CD38+CD45RACD135+); however, the composition of the cell types present in the CMP gate depends strongly on the exact gating strategy applied (see below, Supplementary Fig. 5h, i).

On the basis of these analyses, we provide markers and gating strategies for the prospective isolation of several of these newly defined populations using standard flow cytometry (Figs 3 and 4).

Developmental trajectories of early human haematopoiesis

To obtain a detailed view on the transition from stem cells to lineage-restricted progenitors in the continuous HSPC landscape, we developed STEMNET, a new dimensionality reduction algorithm. STEMNET identifies genes specific to the six LinCD34+CD38+ restricted progenitor populations defined above (Neutro, Eo/Baso/Mast, B-cell, Mono/DC, Ery and Mk; see Supplementary Table 3 for a list of genes used by STEMNET) and then computes the probability that each primitive (‘CLOUD’) HSPC can be assigned to any of these classes. STEMNET thereby places the six developmental endpoints on the corners of a simplex. This resulted in the arrangement of the least-primed HSCs, such as CD49f+ HSCs, to the centre, and the remaining HSPCs localizing in between according to their degree of priming (Fig. 5a, and see Supplementary Fig. 5a, b for individual 2). To describe the position of each cell we computed the predominant direction of priming d as the developmental endpoint closest to the cell and the degree of lineage priming Srel as the (Kullback–Leibler) distance from the least-primed cell.

Figure 5: Visualization of the HSPC continuum.
Figure 5

(a) The similarity of every cell to each of the progenitor classes was computed by STEMNET (see Methods), projected on a unit circle, and used to quantify the degree and direction of transcriptomic priming. Data from individual 1 are shown; for individual 2 see Supplementary Fig. 5a, b. (b) Immunophenotypic populations5,6 were highlighted on the HSPC continuum. pshift indicates P values calculated by kernel-density-based tests comparing each population with CD49f+ HSCs. For CMPs, see Supplementary Fig. 5h, i. For CD49f+ HSCs, n = 101 single cells; CD49fHSCs, n = 117; MPPs, n = 176; CD10-MLPs, n = 52; CD10+MLPs, n = 16; B–NKPs, n = 26; GMPs, n = 244; MEPs, n = 231.

Figure 6: The direction of transcriptomic priming is quantitatively linked to functional lineage potential.
Figure 6

(a) Comparison of the predominant direction of priming d (lympho/myeloid versus megakaryocyte/erythroid) obtained from single-cell transcriptomics to the dominant cell type observed in colonies from single-cell culture. (i) Illustration. (ii) Qualitative comparison of the two quantities with respect to CD45RA and CD135 surface marker expression. (iii) Quantitative link. The most likely dominant direction of priming was estimated for each founder cell from index-culture based on regression models constructed on all surface markers and compared with the observed colony composition (see Supplementary Fig. 7a). P values are from a Fisher test with n = 434 cells (left panel) and n = 193 cells (right panel). (b) Comparison between inferred amount of transcriptomic Mk priming and the percentage of CD41+ Mk cells per colony. Errors bars denote s.e.m. P value is from a Pearson product moment correlation test with n = 627 single cells that formed colonies. See also Supplementary Fig. 7c. (c) Comparison between inferred amount of transcriptomic erythroid priming and the percentage of CD235+ erythroid cells per colony. See also Supplementary Fig. 7c. Errors bars denote s.e.m. P value is from a Pearson product moment correlation test with n = 627 single cells that formed colonies. (d) Xenotransplantation validating a Mk-primed MPP population identified by STEMNET. HSCs, MLPs and a population of putatively Mk-primed MPPs (LinCD34+CD38CD45RACD90CD135) were sorted, transplanted into immunocompromised mice and chimaerism of human lympho/myeloid cells (CD45+), thrombocytes and erythrocytes were determined 2 weeks post transplantation. Experimental set-up (top right panel), localization of populations in STEMNET (left panels), and human engraftment (bottom right panels, error bars denote s.e.m.) are indicated. Relative contribution of thrombocytes was significantly higher in Mk-primed MPPs compared with HSCs (P = 0.0031) and MLPs (P = 0.0002, two-tailed unpaired t-test, n = 6 HSCs, n = 4 Mk-primed MPPs, n = 3 MLPs). Asterisks indicate level of significance as follows: P < 0.01; P < 0.001.

Figure 7: The degree of transcriptomic priming is quantitatively linked to multipotency and proliferative capacity.
Figure 7

(a) Comparison between the inferred amount of transcriptomic priming Srel of the founder cell and the resulting colony size (cell number). (i) Illustration. (ii) Qualitative link. (iii) Quantitative link. Errors bars denote s.e.m. P value is from a Pearson product moment correlation test with n = 1,031 single cells. (b) Comparison between the inferred amount of priming Srel of the founder cell and the number of cell types in the colony. P value is from a Pearson product moment correlation test with n = 1,031 single cells. (c) Inferred transcriptomic degree of priming Srel (x axis) in relation to the colony size (y axis) and the number of cell types per colony (colour code). (d) Distribution of colony types in relation to the presence or absence of erythropoietin (EPO) in the culture medium.

This analysis suggests that HSCs located in the centre of the ‘CLOUD’ gradually acquired continuous lineage priming into either of the major branches. While lympho/myeloid and megakaryocytic/erythroid priming formed major points of attraction, a clear separation into single lineages was not present at this stage (Fig. 5a). In contrast, lineages were clearly separated at the level of LinCD34+CD38+ progenitors, without further sub-branching in this compartment (Fig. 5a, see Supplementary Fig. 5c for CD38 expression). Importantly, these results are not due to limitations of the bioinformatics method, as STEMNET is able to detect both subsequent branching points and discrete intermediate populations on simulated data (Supplementary Fig. 6a–d). Moreover, applying diffusion pseudotime (DPT), a different recently published method for the inference of developmental trajectories24 to our data confirmed the absence of subsequent binary branch points and the direct lineage commitment from CLOUD-HSPCs along continuous trajectories (Supplementary Fig. 6e).

Within the differentiation continuum, STEMNET analysis located previously defined immunophenotypic populations according to their known lineage potential5 (Fig. 5b, see Supplementary Fig. 5b for individual 2). For example, GMPs were distributed to the neutrophil and monocytic/dendritic cell branches while MEPs were located to the megakaryocytic and erythroid branches (notice that the localization of CMPs critically depends on the exact CD38 and CD135 gating strategy, Supplementary Fig. 5h, i). In contrast, immunophenotypic MLPs were located close to the separation of lymphoid, neutrophil and monocytic/dendritic cell lineages (Fig. 5b and Supplementary Fig. 5b), with individual cells already primed towards specific lineages, in line with frequent functional commitment to single lineages in mouse LMPPs15. Together, these analyses suggest that developmental stages immediately downstream of HSCs such as MLPs and MPPs do not represent discrete cell types located at defined branching points, but should rather be considered as transitory states within the HSPC continuum with higher probability for commitment to particular lineages.

While undergoing lineage commitment only very few cells acquired a transcriptomic state of dual-lineage priming (Supplementary Fig. 5d, e), in accordance with a recent single-cell transcriptomic study on mouse GMPs20. However, our analyses suggest that a direct transition from a primed multi-lineage towards a unilineage transcriptomic state represents the main route of lineage commitment, whereas dual-lineage states (such as Gfi1+Irf8+ GMPs, Supplementary Fig. 5f) exist, but represent rare exceptions. Importantly, both transcriptomic and functional (Supplementary Fig. 5g) lineage combinations of bipotent cells were not restricted to the combinations predicted by the classical model, conflicting with a strictly ordered hierarchy of branching events. Along these lines, co-expression of opposing pairs of transcription factors, such as IRF8 and PU.1 (SPI1) that have been thought to establish an oligopotent state, occurred at much lower frequency than previously expected (see Fig. 8a(viii, xi))25.

Figure 8: Lineage commitment is a layered multi-step process.
Figure 8

(a,b) Activity of gene modules associated with developmental progression of HSPCs. Genes depending on the degree and/or direction of priming were identified and clustered into modules displaying similar expression patterns (see Methods). Averaged gene expression of selected modules from individual 1 was highlighted in the HSPC differentiation continuum (a) or smoothened and plotted against the degree of lineage-specific priming (b). For a complete list of modules and individual 2, see Supplementary Fig. 8 and Supplementary Table 4. (c) Gene ontology and FACS marker changes along the early priming of HSPCs (Srel < 0.4). During later stages of priming, GO activity and FACS marker expression additionally depend on the direction of priming (not shown). (d) Graphical summary of a continuum-based model of bone marrow haematopoiesis. Due to the interactions of gene regulatory networks, some cell states and transitions are more likely than others, represented by a lower elevation within a Waddington landscape. During early lineage commitment, small barriers between lineages arise early, thereby creating lineage biases in HSCs. At the progenitor stage these barriers are already more pronounced, making the oligopotent stage less likely. Note that T- and NK-cell development predominantly occurs outside the bone marrow42.

Transcriptomic priming mediates lineage commitment

Single-cell RNA-seq protocols require cell lysis and therefore prohibit subsequent functional interrogation of the same single cell. However, the use of indexed FACS surface markers common to both single-cell ex vivo culture data and single-cell RNA-seq data allowed us to quantitatively link the amount and direction of transcriptomic priming to functional properties such as lineage potential and proliferative capacity. For example, the STEMNET-predicted dominant direction of transcriptional priming into the lympho/myeloid versus the megakaryocytic/erythroid direction was strongly correlated to the surface marker expression of CD135 and CD45RA (Fig. 6a(i, ii)), which could be used to qualitatively predict the predominant cell type in colonies of our single-cell cultures (note that lymphoid progenitors do not grow in these conditions, and that myeloid sublineages are not resolved) (Fig. 6a(ii)). Utilizing all recorded surface markers for linear models on the single-cell RNA-seq data allowed us to quantitatively predict the dominant cell type present in the single-cell cultures for the LinCD34+CD38+ (P = 3.7 × 10−23) and the LinCD34+CD38 compartment (P = 3.7 × 10−22, Fig. 6a(iii) and Supplementary Fig. 7a for the full specification of regression models). Moreover, predicting erythroid and megakaryocytic priming individually revealed that the amount of lineage-specific priming was linked to functional lineage commitment (Fig. 6b, c and Supplementary Fig. 7b, c). However, colonies derived from Mk-primed cells were frequently dominated by other cell types due to their lower proliferative capacity ex vivo (Supplementary Fig. 7b). STEMNET further predicted LinCD34+CD38CD45RACD90CD135 cells to be primed towards megakaryocytic differentiation (Fig. 6d, left panel). To functionally validate this prediction in vivo, we FACS-sorted these cells, transplanted them into sublethally irradiated NSG mice and quantified their lineage output 14 days post transplantation. As predicted, these cells, which we termed Mk-primed MPPs, predominantly generated thrombocytes if compared with MLPs and HSCs (Fig. 6d, right panel). Together, these analyses revealed that transcriptomic priming is linked to the restriction of lineage potential at an early stage in vitro and in vivo.

We next estimated the degree of transcriptomic lineage priming Srel for individual cells from the culture experiments (Fig. 7a, b). As expected, committed progenitors with a high degree of inferred transcriptomic lineage priming formed small colonies (Fig. 7a) of a single-cell type (Fig. 7b). In contrast, primitive HSPCs (low inferred Srel) frequently displayed multi- or bilineage potential (Fig. 7b) and generated much larger colonies (Fig. 7a). However, not all of the primitive HSPCs displayed multipotency, but frequently appeared to be lineage-restricted while typically retaining a high proliferative capacity comparable to their multipotent counterparts (Fig. 7c). These data suggest that proliferative capacity and lineage potency are not obligatorily linked.

To investigate the ability of cells with various amounts of priming to switch lineage potential, we cultured HSPCs in the absence and presence of erythropoietin (EPO). Progenitors that formed exclusively erythroid colonies in the presence of EPO were unable to give rise to alternative lineages in the absence of EPO (Fig. 7d). Moreover, we cultured single HSPCs for one week, split the colonies into four and determined the lineage outcome of the daughter colonies two weeks later. In line with the predictions of our model, the degree of transcriptomic priming was anticorrelated to the propensity of cells to generate daughters with variable lineage composition (Supplementary Fig. 7d, e). Together, these results support the hypothesis that early lineage priming of primitive HSPCs coincides with a loss of functional plasticity.

Molecular processes underlying HSC commitment

To characterize stemness, early lineage priming and transcriptional cell type manifestation on the molecular level, we identified co-expressed gene modules whose activities were associated with the direction and/or the degree of priming. We visualized the activity of these gene modules on the differentiation landscape established above (Fig. 8a(i)) and along the progression from HSCs to each of the six lineages (Fig. 8b and Supplementary Fig. 8a, b and Supplementary Table 4 for a complete list). Importantly, data from both individuals yielded highly comparable results (Supplementary Fig. 8). To gain additional information about biological processes associated with HSC differentiation, we determined the mean expression of genes for each gene ontology (GO) term, and selected representative examples that changed significantly during early lineage priming (Fig. 8c). Together, these analyses provide insights into the global molecular and cell biological processes HSCs encounter while undergoing continuous lineage priming, unilineage commitment and subsequent differentiation.

The least-primed state was characterized by expression of the HOXA3/PRDM16/HOXB6 module26,27,28 (Fig. 8a(ii), b and Supplementary Table 4) and associated with typical stem cell properties such as cell cycle quiescence, low expression of the entire gene expression machinery, low total RNA content (measured by mRNA reads per in vitro spike in RNA read), low cellular respiration29, low CD38 and high CD90 surface expression5 (Fig. 8c). The expression of the HLF/ZFP36L2 module (which also contains the transcription factors MECOM/EVI1, HFL, GATA3) was highest in immature HSCs, but present in the entire ‘CLOUD’ (Fig. 8a(iii), b and Supplementary Table 4)30,31,32.

Intriguingly, stem cells also expressed genes from the earliest priming modules from both the lympho/myeloid (FLT3/SATB1 module) and the megakaryocyte/erythrocyte (GATA2/NFE2 module)33 lineages in a non-exclusive manner (Fig. 8a(iv–v)). These data suggest that the first transcriptional priming events into the predominantly lympho/myeloid or the megakaryocyte/erythrocyte direction are already present in most primitive HSCs, coinciding with the occurrence of first functional lineage biases already at this stage (Figs 6a, b, 7a Srel bin 1 and 2). A number of additional gene modules were activated in a combinatorial fashion between lineages, similar to previous observations from bulk RNA-seq34 (Fig. 8 and Supplementary Fig. 8a and Supplementary Table 4).

Following acquisition of lineage priming, HSCs upregulate their gene expression machinery, mRNA and protein biosynthesis, and respiration29,35, while cell cycle activity increases only marginally (Fig. 8c). At this stage, cells start to express lineage-specific gene modules, for example the SPI1/GFI1 module for the neutrophil lineage (Fig. 8a(viii)) or the IRF1/CASP1 module33 for the B-cell lineage (Fig. 8a(vi)). Other modules active at this stage, however, are shared between lineages; for example, the TAL1/HFS1 module is shared between the erythroid and the megakaryocytic lineage, whereas the EAF2/KLF4 module is shared between the neutrophil and the monocyte lineage. This coincides with the observation that most progenitors at this stage display narrow restriction in their developmental potential, whereas some progenitor cells remain oligopotent15 (Fig. 7b, Srel bin 3).

Manifestation of lineage-specific differentiation is accomplished by activation of gene modules such as the CEBPA/CEPBD module for the neutrophil lineage, the EBF1/ID3 module for the B-cell lineage, the IRF8 module for the monocytic/dendritic lineage, the GPI1BB/PBX1 module for the megakaryocytic lineage and the GATA1/KLF1 module for the erythroid lineage33,36,37 (Fig. 8a(x–xv), b). In all cases, this step is accompanied by cell cycle activation, CD38 surface marker upregulation (Fig. 8c) and unipotency (Fig. 7b, Srel bin 4 and 5).

Together, our data suggest that HSCs are characterized by the expression of specific stem cell modules in combination with early, probably antagonizing priming modules. During the continuous priming and differentiation process the stem cell modules and certain (but not all) early priming modules already expressed in HSCs are turned off, while specific lineage modules become reinforced to drive differentiation towards lineage commitment and manifestation (Fig. 8a, b). Transcription factors from upstream modules may trigger expression of downstream modules, as in the case of GATA2, TAL1 and GATA133. In contrast, transcription factors from mutually exclusive downstream modules may inhibit each other; for example, IRF8 is known to repress CEBPA38. Such inhibitory interactions may render oligopotent progenitors unstable7,10,15, and thus less abundant than previously anticipated (Fig. 7b). In contrast, in cells with a low amount of priming, expression levels of mutually exclusive modules are sufficiently small to allow uni-, oligo- or multipotency.


In summary, we provide a global view of the early human haematopoiesis during homeostasis. Our data set combines both information on the lineage potential of HSCs (index-culture) and insights into the unperturbed lineage commitment of HSCs during human haematopoiesis (reconstruction of developmental trajectories from static single-cell expression data), where lineage tracing approaches8,9 are not possible. Here, we rely on single-cell culture data and xenotransplantation for functional validation, which unlike gene expression or cellular barcoding measure developmental potential, not fate.

Our results are incompatible with fundamental aspects of the differentiation-tree model, in which HSCs are required to pass through discrete and definable intermediate progenitor cell stages by subsequent binary cell fate decisions made on branching points. Instead, we propose that early haematopoiesis is represented by a cellular continuum of low-primed undifferentiated (CLOUD)-HSPCs. This HSPC continuum contains phenotypic MPPs and MLPs, which do not constitute discrete progenitor cell types, but rather transitory states. CLOUD-HSPCs gradually acquire transcriptomic lineage priming in a combination of multiple directions, with some cell state transitions and lineage combinations more likely to occur than others. Distinct lineages emerge directly from CLOUD-HSPCs, earlier than previously anticipated and without passing through a series of discrete, stable progenitors. Our data suggest a multidimensional molecular and cellular landscape of steady-state human haematopoiesis defined by a continuous flow of differentiation and emergence of lineage trajectories independent of each other. This landscape can be visualized by using the classical Waddington’s landscape as a blueprint39,40,41, which more appropriately reflects the continuous nature of haematopoiesis than a ‘cell type tree’ (Fig. 8d). Haematopoietic stem cells reside in a flat valley at the top. Barriers separating individual lineages emerge early and deepen gradually, illustrating the acquisition of lineage biases driven by small differences in gene expression of early fate mediators. When barriers become insurmountable, cell type manifestation and lineage commitment are established.

While our study provides detailed insight into lineage commitment from HSCs into all branches of human bone marrow haematopoiesis, it does not cover lineage decisions occurring further downstream or outside the bone marrow, such as T-cell development. Given the low frequency of eosinophil/basophil/mast cell and monocyte/dendritic cell progenitors within the CD34+ bone marrow compartment, our study cannot fully resolve the separation and maturation of these lineages.

Together, our data determine a comprehensive continuum-based model of early human haematopoiesis, which will probably have important implications for the aetiology of haematologic disorders and which may serve as a paradigm for other adult stem cell systems.□


Bone marrow aspirations.

Bone marrow aspirates from healthy individuals between 25 and 39 years of age were obtained at the University clinics in Heidelberg and Mannheim after written informed consent. The use of human samples for RNA-seq and functional studies was approved by the local ethics committees in accordance with the Declaration of Helsinki. Bone marrow mononuclear cells were isolated by gradient centrifugation using Histopaque-1077 (Sigma).

Flow cytometry.

Bone marrow mononuclear cells were stained with surface markers for 30 min on ice according to standard protocols. For FACS sorting, BD FACS Aria II/III or Fusion flow cytometers (BD Bioscience) equipped with 405 nm, 488 nm, 561 nm and 633 nm (Aria)/642 nm (Fusion) lasers were used. For flow cytometric analyses, LSRII and LSRFortessa flow cytometers (BD Biosciences) equipped with 350 nm, 405 nm, 488 nm, 561 nm and 640 nm lasers were used. For Ki67-Hoechst cell cycle analysis, surface staining was performed as described previously43. Subsequently, cells were fixed and permeabilized using cytofix–cytoperm buffer (BD Bioscience), and incubated with Ki67 antibody overnight at 4 °C. Cells were stained with 2 μg ml−1 Hoechst 33342 (Invitrogen) and analysed. Data were analysed using FlowJo (TreeStar), indeXplorer or R.

Single-cell liquid cultures (‘index-cultures’).

Fresh human bone marrow mononuclear cells were stained as described above with fluorescence-labelled antibodies against CD2, CD34, CD38, CD45RA, CD71, CD90, CD130, CD135, CD238 (KEL), FcεRI and a lineage cocktail consisting of CD4, CD8, CD11b, CD14, CD19, CD20, CD56, CD235a and CD10. Single LinCD34+CD38+CD10 and LinCD34+CD38CD10HSPCs were sorted into ultralow attachment 96-well plates (Corning) containing 100 μl StemSpan SFEM media (Stem Cell Technologies), L-glutamine (100 ng m−1), penicillin/streptomycin (100 ng ml−1) and the following human cytokines: SCF (20 ng ml−1, Peprotech), Flt3-L (20 ng ml−1, Peprotech), TPO (50 ng ml−1, Peprotech), IL-3 (20 ng ml−1, Peprotech), IL-6 (20 ng ml−1, Peprotech), G-CSF (20 ng ml−1, Peprotech), IL-5 (20 ng ml−1, Peprotech), M-CSF (20 ng ml−1, Peprotech), GM-CSF (20 ng ml−1, Peprotech) and EPO (4 U m−1, R&D). For the experiment displayed in Fig. 7d, Epo was left out from the medium. Note that the CD38+ and CD38 gates were set to touch (see also Supplementary Fig. 1a).

Fluorescence intensities were recorded for every channel for each sorted cell and used to retrospectively reconstruct immunophenotypic populations. Cells were cultured for 21 days at 5% CO2 and 37 °C. To characterize clonal progeny, colonies were imaged by microscopy and subsequently analysed for CD15, CD33, CD41a and CD235a expression by flow cytometry. Note that under these conditions, only myeloid (CD33), erythroid (CD235a) and megakaryocytic (CD41a) colonies are efficiently generated. Colonies were judged on the basis of their visual morphology and expression of surface markers. Colony size and lineage output were based on flow cytometry and confirmed by microscopy. A colony was determined to be positive for a particular lineage if ≥10 cells of the respective cell type were detected.

For the ‘split-in-four’ experiment (Supplementary Fig. 7d, e), colonies were evaluated 7 days after seeding of single cells and colonies with more than 50 cells were equally split into 4 wells and cultured for an additional 14 days before colony size and lineage output were scored.

Mouse experiments.

NSG mice were bred and housed under specific pathogen-free conditions at the central animal facility of the German Cancer Research Center. All animal experiments were approved by the Regierungspräsidium Karlsruhe under Tierversuchsantrag numbers G108/12 and G210/12.

A total of 17,000 FACS-sorted HSCs (LinCD34+CD38CD90+CD45RA), MLPs (LinCD34+CD38CD45RA+) or Mk-primed MPPs (LinCD34+CD38CD90CD135) from healthy bone marrow were injected into the femoral bone marrow cavity of female mice at 15 weeks of age that had been sublethally irradiated (200 cGy) 24 h before injection.

Two weeks after xenotransplantation, lineage-specific human engraftment in the injected femur was evaluated by flow cytometry using anti-human-CD45-PE, anti-human-CD235a-APC and anti-human-CD41a-FITC antibodies.

Single-cell transcriptome sequencing (‘index-omics’).

A 25-year-old male donor (individual 1) and a 29-year-old female donor (individual 2) were selected for single-cell RNA-seq. Fresh bone marrow mononuclear cells were stained as described above with fluorescence-labelled antibodies against CD34, CD38, CD45RA, CD90, CD49f, CD135, CD10, CD7 and a lineage cocktail consisting of CD4, CD8, CD11b, CD14, CD19, CD20, CD56 and CD235a. Fluorescence intensities were recorded for every channel for each sorted cell and used to reconstruct immunophenotypic populations subsequently.

While the frequently used smart-seq2 protocol44 failed to amplify transcriptomes from bone marrow-derived human HSPCs, both the QUARTZ-seq protocol45 and a modified smart-seq2 protocol (see below) yielded good-quality cDNA (Supplementary Fig. 2a). To avoid method-specific biases, data were generated using both QUARTZ-seq (individual 2) and smart-seq2.HSC (individual 1), and all findings were systematically compared between individuals (Figs 2 and 3b and Supplementary Figs 4a, b, 5a, b and 8c).

For individual 1, eight plates of LinCD34+CD38 and six plates of LinCD34+CD38+ HSPCs were sorted and whole transcriptome amplification was performed using the smart-seq2 protocol44, but using 5 μl of a modified RT buffer containing 1× SMART First Strand Buffer (Clontech), 1 mM dithiothreitol (Clontech), 1 μM template switching oligo (Exiqon), 10 U μl−1 SMARTScribe (Clontech) and 1 U μl−1 RNASin plus (Promega). ERCC spike-ins were included at a final dilution of 1:1,000,000. Libraries were constructed using a home-made Tn5 transposase (based on ref. 46). Note that the CD38+ and CD38 gates were set to touch (see also Supplementary Fig. 1a).

For individual 2, eight plates of LinCD34+CD38, one plate of LinCD34+CD38CD90+CD45RA and four plates of LinCD34+CD38+ HSPCs were sorted and whole transcriptome amplification was performed using the QUARTZ-Seq protocol45. ERCC spike-ins were included into the lysis buffer at a final dilution of 1:2,000,000. Libraries were constructed using Nextera Tn5 (Illumina) following the protocol provided, but using 1/4 of all volumes. Libraries were then sequenced on an Illumina HiSeq 2500 platform.

Raw data processing and quality control.

Reads were demultiplexed and, where applicable, the remaining poly-A tail of the mRNA was trimmed off. Reads were then aligned to the Homo sapiens genome (build 37.68, also containing the ERCC spike in sequences) using GSNAP47, with the expected paired-end length set to 400 bp and the allowable deviation from the expected paired-end length set to 100 bp. Reads overlapping uniquely with mRNA genes were counted using HTSeq48. As a first filtering step, we retained all cells in which we observed more than 750 genes at a minimum of 10 reads each, and a total of at least 150,000 reads. We removed all genes from the data set that were not observed by at least 10 reads in at least 5 cells. Statistics on these filtering steps are displayed in Supplementary Fig. 2.

We then fitted error models49 to the readcount data (see also below). In 35 cells of individual 2 and 1 cell of individual 1, we observed an extreme overdispersion of the genes classified as non-dropout events. These cells were removed. In individual 1, we further excluded 13 cells with an abnormal CD38CD90high immunophenotype (Supplementary Fig. 1a). These cells were clear outliers also with regard to gene expression, as they mostly expressed genes associated with various types of mature immune cell (not shown).

Data normalization using posterior odds ratio.

We designed a normalization method to address the following two challenges: single-cell transcriptomics has large technical variability; and human haematopoietic stem and progenitor cells largely differ in RNA content (Supplementary Fig. 2h).

While lowly expressed genes are sometimes observed in cells with high total RNA content, they are almost never seen in cells with low total RNA content (Supplementary Fig. 2i). As this effect is the same for all genes of low expression level, it will induce some correlation structure on the data. In our data set, the first principal component was correlated to the library size and mRNA content, which may dominate over the effects of developmental transitions (Supplementary Fig. 2j, panel i). Normalization through division by total library size or harmonic mean estimator does not resolve this issue, as lowly expressed genes are still unobserved (zero) in cells of low mRNA content (Supplementary Fig. 2i, j panel ii). We and others have therefore used hierarchical models that assume that molecule counts are created by sampling from the true amount of mRNA molecules with cell-specific sampling efficiencies50,51. To adapt these approaches to the case where no molecular barcodes were used, we here use the error model of ref. 49, which describes the posterior probability of a gene expression level x in a cell c as where pd is the probability of a dropout event at gene expression x, pNB is the probability of observing rc reads in the case of no dropout and pPoisson(x) is the probability of observing rc spurious reads in the case of a dropout. Ωc is a vector of cell-specific and numerically optimized parameters: the slope and intercept of pd as a function of rc; the slope and intercept of x as a function of rc; the dispersion of the negative binomial distribution pNB(x|rc); and the background frequency λ of the Poisson distribution, which was fixed to 0.1.

The maximum posterior average expression across all cells is then given by While the mean of ∏ cp(x|rc, Ωc) describes the expression magnitude of a gene in a given cell, its spread describes the uncertainty due to technical noise. To obtain a single number that weighs expression magnitude by confidence level, we compute a posterior odds ratio (POR): POR can be interpreted as the evidence (in bits) that a specific gene in a specific cell is expressed more highly (or lowly) than in the average cell. The use of POR scores in principal component analysis solved the problems associated with the above-mentioned normalization strategies (Supplementary Fig. 2j panel iii). POR scores were used as the measure of gene expression for all analyses.


For hierarchical clustering, we selected the 1,000 most variable genes of each population. We then used Ward linkage on Euclidean distances. Gap statistics was computed on the same hierarchical clustering function using the R package cluster. Random walk analysis52 was performed by constructing a 5-nearest-neighbour graph on correlation distances, initializing at a random node, and then simulating a series of random steps on the 5-connected graph. The local clustering coefficient of a node in such a graph quantifies the extent to which the neighbours of two connected cells are themselves connected to each other. It was computed using the transitivity function of the igraph package53.


Basic set-up. To identify processes associated with the transition of HSCs to progenitor cell types, we sought a lower-dimensional representation of the HSPC data that reflects lineage priming. We therefore trained an elastic-net regularized generalized linear model (GLMNET) of the multinomial family on the most mature populations (N1-3, EBM, MD, spB1/2, E1/2 and Mk from Fig. 2a for individual 1, or lpB, EBM, N, ME and MD for individual 2), using class membership as the response variable. During this step, a number of population-specific genes was identified (Supplementary Table 3). The classifier then used the expression of these genes in all cells to estimate the probability pij that a cell i belongs to class j. From these probabilities, we compute the Kullback–Leibler distance from the average HSPC, which can be interpreted as the amount of lineage information a given cell has acquired: where is the average probability of a cell to belong to class j. We further assign each cell a predominant direction of priming as For displaying the six-dimensional vector pi in two dimensions, the developmental endpoints are arranged on the edge of a circle and all cells are placed in between. Each endpoint k is assigned with an angle αk. The class probabilities pik are then transformed to Cartesian coordinates by and To find the optimal arrangement of the developmental endpoints on the circle, lineages with common precursor stages are placed next to each other. The proximity between lineages l and k is computed by All arrangements are tested and the arrangement with the highest proximity is chosen. This approach is based on a method termed ‘circular a posteriori projection’51.

Data simulation. To test the ability of the STEMNET method to uncover binary branching events and discrete subpopulations, we quantitatively specified alternative models of cell fate specification and reshuffled our original data according to these models (Supplementary Fig. 6). In particular, we assumed that each cell is located on a binary tree, where nodes represent branching points and edges between nodes represent developmental trajectories. Each node Vi is specified by a tuple (E1, E2, p1, p2, h) with E1,2 pointing to the left and right child, p1,2 giving the probability that a cell adapts the fate associated with the left and right child (p1 + p2  =  1), and h (0,1) giving the height of the node (for developmental endpoints, h  =  1, and for the root, h  =  0). A cell is then defined by the tuple (h,E), where E points to the next node downstream of the cell.

For the scenario depicted in Supplementary Fig. 6a, cells were generated by randomly drawing values h from a Beta distribution with parameters (2,3). E was assigned by moving down a distance of h from the root and randomly choosing a branch according to p1,2 at each node that was passed. For the scenario depicted in Supplementary Fig. 6d cells were then scattered around the nearest node assuming an average distance of 0.01. The developmental distance D(ci, Vj) between a cell ci and a node Vj is then computed by traversing through the tree and summing all distances h that are passed along the way. For example, the distance between two developmental endpoints that diverge at a node with h  =  0.6 is 0.8. To generate synthetic data from these cell fate specification models, we extracted the coefficients of the STEMNET classifier (Supplementary Table 3), and for each developmental endpoint j compiled lists of genes with nonzero coefficient. Gene expression values for these genes were then reordered across cells i to follow the developmental distance D(ci, Vj) (that is, assuming that gene expression of lineage-specific genes was entirely determined by developmental distance, Supplementary Fig. 6a). Alternatively, gene expression values were randomly reshuffled such that the correlation between developmental distance from Vj and gene expression equals the empirically observed correlation between gene expression and pj from the STEMNET classifier (Supplementary Fig. 6b–d).

Quantitative link between single-cell transcriptomics and single-cell culture. To quantitatively link single-cell transcriptomic properties (such as the amount or direction of priming) to single-cell functional properties, we made use of FACS markers used in both experiments. In particular, for each transcriptomic property, we constructed a regression model with logicle transformed flow cytometry markers as explanatory variables and the property as a response variable. To achieve greater robustness than in standard linear regression, we applied GLMNET models of the normal family for this task, and used tenfold cross-validation to determine the regularization parameter λ. The regression coefficients of these models are shown in Supplementary Fig. 7a together with the R2 these models achieve in tenfold cross-validation if applied to the single-cell transcriptomic data. We then applied these classifiers to logicle transformed flow cytometry data from the single-cell culture experiment to estimate the magnitude of single-cell transcriptomic properties in that experiment. To further improve the classifier, we also included rank-transformed mRNA expression levels of TFRC (CD71) and KEL in the training data, and rank-transformed flow cytometry data of CD71 and KEL in the single-cell culture experiment.

Identification of gene clusters associated with lineage priming. We then identified genes whose expression depends on Srel, d, or both, by separately fitting four different linear models to the expression data of each gene. The first model describes gene expression as a function of the predominant direction d, which is a categorial variable. It best fits to genes that are up- or downregulated early during developmental progression in a certain direction and stay unchanged until the end. The second model describes gene expression as a function of a third-degree polynomial through log10Srel. It best fits to genes that are up- or downregulated at a specific stage of developmental progression, independent of the developmental direction. The third model describes gene expression as a function of d, a third-degree polynomial through log10Srel and the interaction of d and log10Srel. It best fits to genes that are up- or downregulated at a specific stage of development in a specific direction. The fourth model describes gene expression as a constant. It best fits to genes that do not change systematically during acquisition of lineage fate. For each gene, we identified the optimal model by comparing the models’ Bayesian Information Criteria (BIC). For each class of genes (dependent on log10Srel, d or both) separately, we identified subgroups of genes that display similar dependencies on log10Srel and d by performing hierarchical clustering using correlation distance and complete linkage on the fitted values from the preferred model.

Statistics and reproducibility.

Single-cell RNA-seq was performed on two different individuals. Totals of 1,034 (for I1) and 379 cells (for I2) were included into the study. Single-cell culture was performed for 2,038 cells. As indicated in the figure legends, P values are computed from the Pearson product moment correlation test, kernel-density-based global two-sample comparison test or two-tailed unpaired t-test.

For animal experiments, no statistical method was used to predetermine sample size. The experiments were not randomized. The investigators were not blinded to animal allocation during experiments and outcome assessment.

Code availability.

Most analyses were performed in indeXplorer, a custom-made software for the analysis of single-cell index-sorting/transcriptomic data sets. indeXplorer was written in R and relies on the package shiny; code is available from https://git.embl.de/velten/indeXplorer.

For analyses that were not performed in indeXplorer directly, we provide an R package containing all code at https://git.embl.de/velten/STEMNET.

Data availability.

RNA-seq data that support the findings of this study have been deposited in the Gene Expression Omnibus (GEO) under accession code GSE75478. Processed data are available at http://steinmetzlab.embl.de/shiny/indexplorer/?launch=yes for browsing. All other data supporting the findings of this study are available from the corresponding author on reasonable request.

Additional Information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


Primary accessions

Gene Expression Omnibus


  1. 1.

    , & Establishment of a normal hematopoietic and leukemia stem cell hierarchy. Cold Spring Harb. Symp. Quant. Biol. 73, 439–449 (2008).

  2. 2.

    , & The biology of hematopoietic stem cells. Annu. Rev. Cell Dev. Biol. 11, 35–71 (1995).

  3. 3.

    , & Identification of clonogenic common lymphoid progenitors in mouse bone marrow. Cell 91, 661–672 (1997).

  4. 4.

    , , & A clonogenic common myeloid progenitor that gives rise to all myeloid lineages. Nature 404, 193–197 (2000).

  5. 5.

    et al. Revised map of the human progenitor hierarchy shows the origin of macrophages and dendritic cells in early lymphoid development. Nat. Immunol. 11, 585–593 (2010).

  6. 6.

    et al. Isolation of single human hematopoietic stem cells capable of long-term multilineage engraftment. Science 333, 218–221 (2011).

  7. 7.

    et al. Distinct routes of lineage development reshape the human blood hierarchy across ontogeny. Science 351, aab2116 (2016).

  8. 8.

    et al. Clonal dynamics of native haematopoiesis. Nature 514, 322–327 (2014).

  9. 9.

    et al. Fundamental properties of unperturbed haematopoiesis from stem cells in vivo. Nature 518, 542–546 (2015).

  10. 10.

    , , , & The branching point in erythro-myeloid differentiation. Cell 163, 1655–1662 (2015).

  11. 11.

    et al. Inflammation-induced emergency megakaryopoiesis driven by hematopoietic stem cell-like megakaryocyte progenitors. Cell Stem Cell 17, 422–434 (2015).

  12. 12.

    et al. Revision of the human hematopoietic tree: granulocyte subtypes derive from distinct hematopoietic lineages. Cell Rep. 3, 1539–1552 (2013).

  13. 13.

    et al. Identification of Flt3+ lympho-myeloid stem cells lacking erythro-megakaryocytic potential. Cell 121, 295–306 (2005).

  14. 14.

    et al. Clonal analysis unveils self-renewing lineage-restricted progenitors generated directly from hematopoietic stem cells. Cell 154, 1112–1126 (2013).

  15. 15.

    et al. Diverse and heritable lineage imprinting of early haematopoietic progenitors. Nature 496, 229–232 (2013).

  16. 16.

    et al. Transcriptional heterogeneity and lineage commitment in myeloid progenitors. Cell 163, 1663–1677 (2015).

  17. 17.

    et al. Combined single-cell functional and gene expression analysis resolves heterogeneity within stem cell populations. Cell Stem Cell 16, 712–724 (2015).

  18. 18.

    et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 32, 381–386 (2014).

  19. 19.

    et al. Single-cell RNA-Seq with waterfall reveals molecular cascades underlying adult neurogenesis. Cell Stem Cell 17, 360–372 (2015).

  20. 20.

    Single-cell analysis of mixed-lineage states leading to a binary cell fate choice. Nature 537, 698–702 (2016).

  21. 21.

    The transcriptional program of terminal granulocytic differentiation. Blood 105, 1785–1796 (2005).

  22. 22.

    Neutrophils, from marrow to microbes. Immunity 33, 657–670 (2010).

  23. 23.

    , , & Orchestrating B cell lymphopoiesis through interplay of IL-7 receptor and pre-B cell receptor signalling. Nat. Rev. Immunol. 14, 69–80 (2013).

  24. 24.

    , , , & Diffusion pseudotime robustly reconstructs lineage branching. Nat. Methods 13, 845–848 (2016).

  25. 25.

    et al. Early myeloid lineage choice is not initiated by random PU.1 to GATA1 protein ratios. Nature 535, 299–302 (2016).

  26. 26.

    et al. HOXB6 overexpression in murine bone marrow immortalizes a myelomonocytic precursor in vitro and causes hematopoietic stem cell expansion and acute myeloid leukemia in vivo. Blood 105, 1456–1466 (2005).

  27. 27.

    et al. HoxA3 is an apical regulator of haemogenic endothelium. Nat. Cell Biol. 13, 72–78 (2011).

  28. 28.

    , , & Prdm16 promotes stem cell maintenance in multiple tissues, partly by regulating oxidative stress. Nat. Cell Biol. 12, 999–1006 (2010).

  29. 29.

    & Metabolic requirements for the maintenance of self-renewing stem cells. Nat. Rev. Mol. Cell Biol. 15, 243–256 (2014).

  30. 30.

    et al. Hierarchical and ontogenic positions serve to define the molecular basis of human hematopoietic stem cell behavior. Dev. Cell 8, 651–663 (2005).

  31. 31.

    et al. Evi1 is essential for hematopoietic stem cell self-renewal, and its expression marks hematopoietic cells with long-term multilineage repopulating activity. J. Exp. Med. 208, 2403–2416 (2011).

  32. 32.

    et al. GATA-3 regulates the self-renewal of long-term hematopoietic stem cells. Nat. Immunol. 14, 1037–1044 (2013).

  33. 33.

    , , , & From stem cell to red cell: regulation of erythropoiesis at multiple levels by multiple proteins, RNAs, and chromatin modifications. Blood 118, 6258–6269 (2011).

  34. 34.

    et al. Densely interconnected transcriptional circuits control cell states in human hematopoiesis. Cell 144, 296–309 (2011).

  35. 35.

    , , & Haematopoietic stem cells require a highly regulated protein synthesis rate. Nature 509, 49–54 (2014).

  36. 36.

    Transcriptional control of granulocyte and monocyte development. Oncogene 26, 6816–6828 (2007).

  37. 37.

    et al. Characterization of early stages of human B cell development by gene expression profiling. J. Immunol. 179, 3662–3671 (2007).

  38. 38.

    et al. IRF8 inhibits C/EBPα activity to restrain mononuclear phagocyte progenitors from differentiating into neutrophils. Nat. Commun. 5, 4978 (2014).

  39. 39.

    The Strategy of the Genes (Routledge, 1957).

  40. 40.

    , & Non-genetic heterogeneity—a mutation-independent driving force for the somatic evolution of tumours. Nat. Rev. Genet. 10, 336–342 (2009).

  41. 41.

    Non-genetic heterogeneity of cells in development: more than just noise. Development 136, 3853–3862 (2009).

  42. 42.

    & Human natural killer cell development. Immunol. Rev. 214, 56–72 (2006).

  43. 43.

    et al. IFNα activates dormant haematopoietic stem cells in vivo. Nature 458, 904–908 (2009).

  44. 44.

    et al. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat. Methods 10, 1096–1098 (2013).

  45. 45.

    et al. Quartz-Seq: a highly reproducible and sensitive single-cell RNA-Seq reveals non-genetic gene expression heterogeneity. Genome Biol. 14, R31 (2013).

  46. 46.

    et al. Tn5 transposase and tagmentation procedures for massively scaled sequencing projects. Genome Res. 24, 2033–2040 (2014).

  47. 47.

    & Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26, 873–881 (2010).

  48. 48.

    , & HTSeq–a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015).

  49. 49.

    , & Bayesian approach to single-cell differential expression analysis. Nat. Methods 11, 740–742 (2014).

  50. 50.

    et al. Single-cell polyadenylation site mapping reveals 3’ isoform choice variability. Mol. Syst. Biol. 11, 812 (2015).

  51. 51.

    et al. Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types. Science 343, 776–779 (2014).

  52. 52.

    Graph clustering via a discrete uncoupling process. SIAM J. Matrix Anal. Appl. 30, 121–141 (2008).

  53. 53.

    & The igraph software package for complex network research. InterJournal Complex Systems, 1695 (2006).

Download references


We thank C. Drumm for help with 3D graphics, K. Hexel, S. Schmitt, C. Felbinger and M. Eich from the DKFZ flow cytometry facility for flow cytometry support, the EMBL Genomics Core Facility for sequencing and R. Aiyar, A. Jones, M. Milsom and all members of HI-STEM and the Steinmetz group for helpful discussions on the manuscript as well as T. Schroeder and D. Löffler for initial discussions. This work was supported by the SFB873 funded by the Deutsche Forschungsgemeinschaft (DFG) (to C.L., M.A.G.E. and A.T.), the Dietmar Hopp Foundation (to M.A.G.E. and A.T.) and the US National Institutes of Health (P01 HG000205 to L.M.S.).

Author information

Author notes

    • Lars Velten
    • , Simon F. Haas
    •  & Simon Raffel

    These authors contributed equally to this work.

    • Andreas Trumpp
    • , Marieke A. G. Essers
    •  & Lars M. Steinmetz

    These authors jointly supervised this work.


  1. European Molecular Biology Laboratory (EMBL), Genome Biology Unit, 69117 Heidelberg, Germany

    • Lars Velten
    • , Bianca P. Hennig
    • , Wolfgang Huber
    •  & Lars M. Steinmetz
  2. Heidelberg Institute for Stem Cell Technology and Experimental Medicine (HI-STEM gGmbH), 69120 Heidelberg, Germany

    • Simon F. Haas
    • , Simon Raffel
    • , Sandra Blaszkiewicz
    • , Christoph Hirche
    • , Andreas Trumpp
    •  & Marieke A. G. Essers
  3. Division of Stem Cells and Cancer, Haematopoietic Stem Cells and Stress Group, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany

    • Simon F. Haas
    • , Sandra Blaszkiewicz
    • , Christoph Hirche
    •  & Marieke A. G. Essers
  4. Division of Stem Cells and Cancer and DKFZ-ZMBH Alliance, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany

    • Simon F. Haas
    • , Simon Raffel
    •  & Andreas Trumpp
  5. Department of Internal Medicine V, University of Heidelberg, 69120 Heidelberg, Germany

    • Simon Raffel
    • , Christoph Lutz
    • , Eike C. Buss
    •  & Anthony D. Ho
  6. Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA

    • Saiful Islam
    •  & Lars M. Steinmetz
  7. Department of Hematology and Oncology, Medical Faculty Mannheim, University of Heidelberg, 68167 Mannheim, Germany

    • Daniel Nowak
    • , Tobias Boch
    •  & Wolf-Karsten Hofmann
  8. German Cancer Consortium (DKTK), 69120 Heidelberg, Germany

    • Andreas Trumpp
  9. Stanford Genome Technology Center, Palo Alto, California 94304, USA

    • Lars M. Steinmetz


  1. Search for Lars Velten in:

  2. Search for Simon F. Haas in:

  3. Search for Simon Raffel in:

  4. Search for Sandra Blaszkiewicz in:

  5. Search for Saiful Islam in:

  6. Search for Bianca P. Hennig in:

  7. Search for Christoph Hirche in:

  8. Search for Christoph Lutz in:

  9. Search for Eike C. Buss in:

  10. Search for Daniel Nowak in:

  11. Search for Tobias Boch in:

  12. Search for Wolf-Karsten Hofmann in:

  13. Search for Anthony D. Ho in:

  14. Search for Wolfgang Huber in:

  15. Search for Andreas Trumpp in:

  16. Search for Marieke A. G. Essers in:

  17. Search for Lars M. Steinmetz in:


S.F.H., S.R., L.V., S.B. and C.H. performed the experiments. L.V. analysed the data, with conceptual input from S.F.H., S.R., L.M.S., M.A.G.E. and A.T., and analytical advice from W.H. S.I. and B.P.H. optimized genomics methods. C.L., E.C.B., D.N., T.B., W.-K.H. and A.D.H. obtained bone marrow aspirates. L.V., S.F.H., S.R., M.A.G.E., L.M.S. and A.T. jointly conceived and designed the study, and wrote the manuscript.

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Andreas Trumpp or Marieke A. G. Essers or Lars M. Steinmetz.

Integrated supplementary information

Supplementary information

PDF files

  1. 1.

    Supplementary Information

    Supplementary Information

Excel files

  1. 1.

    Supplementary Table 1

    Supplementary Information

  2. 2.

    Supplementary Table 2

    Supplementary Information

  3. 3.

    Supplementary Table 3

    Supplementary Information

  4. 4.

    Supplementary Table 4

    Supplementary Information

About this article

Publication history