FateID infers cell fate bias in multipotent progenitors from single-cell RNA-seq data

Article metrics

Abstract

To understand stem cell differentiation along multiple lineages, it is necessary to resolve heterogeneous cellular states and the ancestral relationships between them. We developed a robotic miniaturized CEL-Seq2 implementation to carry out deep single-cell RNA-seq of 2,000 mouse hematopoietic progenitors enriched for lymphoid lineages, and used an improved clustering algorithm, RaceID3, to identify cell types. To resolve subtle transcriptome differences indicative of lineage biases, we developed FateID, an iterative supervised learning algorithm for the probabilistic quantification of cell fate bias in progenitor populations. Here we used FateID to delineate domains of fate bias and enable the derivation of high-resolution differentiation trajectories, thereby revealing a common progenitor population of B cells and plasmacytoid dendritic cells, which we validated by in vitro differentiation assays. We expect that FateID will improve understanding of the process of cell fate choice in complex multi-lineage differentiation systems.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Elucidating transcriptome heterogeneity of multipotent hematopoietic progenitors by scRNA-seq.
Figure 2: FateID quantifies lineage bias in multipotent progenitor populations.
Figure 3: The importance of genes for lineage classification depends on the differentiation stage.
Figure 4: FateID identifies a common progenitor population of B cells and pDCs.
Figure 5: In vitro differentiation assay confirmed predicted lymphoid progenitor populations of pDCs and B cells.

Accession codes

Primary accessions

Gene Expression Omnibus

References

  1. 1

    Paul, F. et al. Transcriptional heterogeneity and lineage commitment in myeloid progenitors. Cell 163, 1663–1677 (2015).

  2. 2

    Velten, L. et al. Human haematopoietic stem cell lineage commitment is a continuous process. Nat. Cell Biol. 19, 271–281 (2017).

  3. 3

    Olsson, A. et al. Single-cell analysis of mixed-lineage states leading to a binary cell fate choice. Nature 537, 698–702 (2016).

  4. 4

    Nestorowa, S. et al. A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. Blood 128, e20–e31 (2016).

  5. 5

    Drissen, R. et al. Distinct myeloid progenitor–differentiation pathways identified through single-cell RNA sequencing. Nat. Immunol. 17, 666–676 (2016).

  6. 6

    Perié, L., Duffy, K.R., Kok, L., de Boer, R.J. & Schumacher, T.N. The branching point in erythro-myeloid differentiation. Cell 163, 1655–1662 (2015).

  7. 7

    Busch, K. et al. Fundamental properties of unperturbed haematopoiesis from stem cells in vivo. Nature 518, 542–546 (2015).

  8. 8

    Yu, V.W.C. et al. Epigenetic memory underlies cell-autonomous heterogeneous behavior of hematopoietic stem cells. Cell 167, 1310–1322 (2016).

  9. 9

    Orkin, S.H. & Zon, L.I. Hematopoiesis: an evolving paradigm for stem cell biology. Cell 132, 631–644 (2008).

  10. 10

    Haghverdi, L., Büttner, M., Wolf, F.A., Buettner, F. & Theis, F.J. Diffusion pseudotime robustly reconstructs lineage branching. Nat. Methods 13, 845–848 (2016).

  11. 11

    Setty, M. et al. Wishbone identifies bifurcating developmental trajectories from single-cell data. Nat. Biotechnol. 34, 637–645 (2016).

  12. 12

    Marco, E. et al. Bifurcation analysis of single-cell gene expression data reveals epigenetic landscape. Proc. Natl. Acad. Sci. USA 111, E5643–E5650 (2014).

  13. 13

    Chen, J., Schlitzer, A., Chakarov, S., Ginhoux, F. & Poidinger, M. Mpath maps multi-branching single-cell trajectories revealing progenitor cell progression during development. Nat. Commun. 7, 11988 (2016).

  14. 14

    Hashimshony, T. et al. CEL-Seq2: sensitive highly-multiplexed single-cell RNA-Seq. Genome Biol. 17, 77 (2016).

  15. 15

    Svensson, V. et al. Power analysis of single-cell RNA-sequencing experiments. Nat. Methods 14, 381–387 (2017).

  16. 16

    Ziegenhain, C. et al. Comparative analysis of single-cell RNA sequencing methods. Mol. Cell 65, 631–643 (2017).

  17. 17

    Zheng, G.X.Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).

  18. 18

    Grün, D. et al. De novo prediction of stem cell identity using single-cell transcriptome data. Cell Stem Cell 19, 266–277 (2016).

  19. 19

    Satija, R., Farrell, J.A., Gennert, D., Schier, A.F. & Regev, A. Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33, 495–502 (2015).

  20. 20

    Macosko, E.Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015).

  21. 21

    Kiselev, V.Y. et al. SC3: consensus clustering of single-cell RNA-seq data. Nat. Methods 14, 483–486 (2017).

  22. 22

    Li, H. et al. Reference component analysis of single-cell transcriptomes elucidates cellular heterogeneity in human colorectal tumors. Nat. Genet. 49, 708–718 (2017).

  23. 23

    Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).

  24. 24

    Satpathy, A.T., Wu, X., Albring, J.C. & Murphy, K.M. Re(de)fining the dendritic cell lineage. Nat. Immunol. 13, 1145–1154 (2012).

  25. 25

    Poltorak, M.P. & Schraml, B.U. Fate mapping of dendritic cells. Front. Immunol. 6, 199 (2015).

  26. 26

    Welner, R.S. et al. Asynchronous RAG-1 expression during B lymphopoiesis. J. Immunol. 183, 7768–7777 (2009).

  27. 27

    Corcoran, L. et al. The lymphoid past of mouse plasmacytoid cells and thymic dendritic cells. J. Immunol. 170, 4926–4932 (2003).

  28. 28

    Inlay, M.A. et al. Ly6d marks the earliest stage of B-cell specification and identifies the branchpoint between B-cell and T-cell development. Genes Dev. 23, 2376–2381 (2009).

  29. 29

    Qiu, X. et al. Reversed graph embedding resolves complex single-cell trajectories. Nat. Methods 14, 979–982 (2017).

  30. 30

    Pietras, E.M. et al. Functionally distinct subsets of lineage-biased multipotent progenitors control blood production in normal and regenerative conditions. Cell Stem Cell 17, 35–46 (2015).

  31. 31

    Notta, F. et al. Distinct routes of lineage development reshape the human blood hierarchy across ontogeny. Science 351, aab2116 (2016).

  32. 32

    Cabezas-Wallscheid, N. et al. Identification of regulatory networks in HSCs and their immediate progeny via integrated proteome, transcriptome, and DNA methylome analysis. Cell Stem Cell 15, 507–522 (2014).

  33. 33

    Sathe, P., Vremec, D., Wu, L., Corcoran, L. & Shortman, K. Convergent differentiation: myeloid and lymphoid pathways to murine plasmacytoid dendritic cells. Blood 121, 11–19 (2013).

  34. 34

    Onai, N. et al. A clonogenic progenitor with prominent plasmacytoid dendritic cell developmental potential. Immunity 38, 943–957 (2013).

  35. 35

    Onai, N. et al. Identification of clonogenic common Flt3+M-CSFR+ plasmacytoid and conventional dendritic cell progenitors in mouse bone marrow. Nat. Immunol. 8, 1207–1216 (2007).

  36. 36

    Medina, K.L. et al. Separation of plasmacytoid dendritic cells from B-cell-biased lymphoid progenitor (BLP) and pre-pro B cells using PDCA-1. PLoS One 8, e78408 (2013).

  37. 37

    Villani, A.-C. et al. Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes, and progenitors. Science 356, eaah4573 (2017).

  38. 38

    See, P. et al. Mapping the human DC lineage through the integration of high-dimensional techniques. Science 356, eaag3009 (2017).

  39. 39

    Naik, S.H. et al. Development of plasmacytoid and conventional dendritic cell subtypes from single precursor cells derived in vitro and in vivo. Nat. Immunol. 8, 1217–1226 (2007).

  40. 40

    Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595 (2010).

  41. 41

    Grün, D., Kester, L. & van Oudenaarden, A. Validation of noise models for single-cell transcriptomics. Nat. Methods 11, 637–640 (2014).

  42. 42

    Grün, D. et al. Single-cell messenger RNA sequencing reveals rare intestinal cell types. Nature 525, 251–255 (2015).

  43. 43

    Ritchie, M.E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).

  44. 44

    Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010).

  45. 45

    van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2570–2605 (2008).

Download references

Acknowledgements

We thank A. Pospisilik for help in developing mCEL-Seq2. We acknowledge extensive support from S. Hobitz, K. Schuldes, and D. Wild in flow cytometry, and U. Boenisch in deep sequencing. We also thank T. Boehm, C. Happe, and R. Grosschedl for valuable input and support. We thank T. Boehm, N. Cabezas-Wallscheid, and E. Trompouki for critical reading of the manuscript and valuable feedback. S. acknowledges funding from the Behrens-Weise Foundation. This work was supported by the Max Planck Society.

Author information

J.S.H. performed all experiments, analyzed the data, and created the web interface; S. established the mCEL-Seq2 protocol with help of J.S.H.; D.G. designed the study, developed the algorithm, and wrote the paper; and all authors edited the manuscript.

Correspondence to Dominic Grün.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Miniaturized high-throughput version of CEL-Seq2 maintains high sensitivity and accuracy.

(a-e) Violin plots showing a comparison between manual CEL-Seq2 and the robotic version at different volume reductions for the distributions of (a) the number of transcripts per mESC, (b) the fraction of recovered ERCC1 spike-in RNA, (c) the number of genes per mESC, (d) Pearson’s correlation coefficient between measured and actual spike-in numbers, and (e) Pearson’s correlation coefficient between spike-in levels measured in distinct cells. In (a-e) violin plots represent the density distribution of the data. Overlaid boxplots show the median (white dot) and the interquartile range (box limits). The whiskers extend to the most extreme data point, within 1.5 times the interquartile range from the box. Outliers are indicated. The sample size in (a-e) 48 cells for each group from n=1 experiments. (f, g) Dependence of sequencing efficiency on sequence composition. A regression was calculated of the average sequenced spike-in number on the actual spike-in number, setting the intercept to zero. The scatter plots show the dependence of the deviation of the measured spike-in level from the regression line, normalized by the average expression, on (f) percentage GC content and (g) sequence length. Data points for 96 ERCC1 spike-in sequences are shown in (f) and (g). Shown data are from one experiment.1. Baker, S. C. et al. The External RNA Controls Consortium: a progress report. Nat. Methods 2, 731–4 (2005)

Supplementary Figure 2 Gate settings used for index sorting of hematopoietic progenitors.

Two different sorting schemes were used. In a first experiment with n = 2 mice (sorting scheme 1), we purified 384 Lineage (Lin) KithiSca-1hi (LSK) cells with high surface expression of Kit and Sca-1 (encoded by Ly6a), a permissive sorting strategy to sample the pool of HSCs and multipotent progenitors. In addition, we sorted 768 lymphoid-primed multipotent progenitors (LMPP) as Flt3+ LSK cells, and 384 LinKitloSca-1lo as well as 192 LinKitloSca-1lo/−Flt3hi common lymphoid progenitors (CLP) in order to enrich for the lymphoid branch. We deliberately did not gate on Il7r to comprehensively sample CLPs1–3. In order to capture more mature and potentially underrepresented progenitor states of all hematopoietic lineages we subsequently applied a second sorting strategy (sorting scheme 2). Here, we sequenced 1,152 cells from 16 adjacent windows spanning different ranges of Kit and Sca-1 expression and being either Flt3+ or Flt3. Cells in (a) and (b) were sorted from n = 2 mice each in another independent experiment. Flow cytometry data are from one experiment with n = 1 mouse.1. Inlay, M. A. et al. Ly6d marks the earliest stage of B-cell specification and identifies the branchpoint between B-cell and T-cell development. Genes Dev. 23, 2376–81 (2009).2. Mansson, R. et al. Single-cell analysis of the common lymphoid progenitor compartment reveals functional and molecular heterogeneity. Blood 115, 2601–2609 (2010).3. Tsapogas, P. et al. IL-7 mediates Ebf-1-dependent lineage restriction in early lymphoid progenitors. Blood 118, 1283–1290 (2011).

Supplementary Figure 3 Single-cell sequencing of index-sorted lymphoid progenitor populations.

(a) Boxplot of transcript count per cell distribution for LSK cells, LMPPs and CLPs. The sample sizes are 370, 748, and 494 cells for LSK cells, LMPPs, and CLPs, sorted from two independent experiments with n = 2 mice. (b, c) Saturation analysis. (b) The number of UMIs detected in cells of the LMPP sample is shown as a function of the fraction of reads used for analysis. (c) The number of genes detected in cells of the LMPP sample is shown as a function of the fraction of reads used for analysis. (d) To benchmark the quality of our dataset, we compared Cd34+ LMPPs to mouse (and human) Cd34+ (CD34+) datasets generated with different sequencing technologies1–3, including Smart-seq24, the commercial 10x GemCode technology and MARS-seq5. Boxplot of UMI count distribution in Cd34+ (or CD34+) cells from different mouse (or human) hematopoietic progenitor datasets. Herman: Cd34+ LMPPs from this study (725 cells from n = 2 mice). Paul2: Cd34+ cells from the common myeloid progenitor gate sequenced with MARS-seq (2,370 cells from n = 4 mice). Zheng (PB)3: Cd34+ (mRNA-)positive cells from the CD34+ surface protein–positive peripheral blood population generated with 10x GemCode technology (3,213 cells from n=1 donor). Zheng (BM)3: Cd34+ cells from post-transplantation bone-marrow of an acute myeloid leukemia patient (AML027) generated with 10x GemCode technology (290 cells from n = 1 donor). Selection was always based on Cd34 (CD34) mRNA levels. (e) Boxplot of the number of detected genes per cell for the same samples as in (d) and a high-sensitivity dataset sequenced with a non-UMI based full-length coverage technology. Velten1: CD34+ human bone marrow cells from individual 1 sequenced with Smart-seq2 (1,035 cells from n = 1 donor). In (a-e), bold line indicates the median and box limits represent interquartile range. The whiskers extend to the most extreme data point, within 1.5 times the interquartile range from the box. Outliers are indicated. (f-j) Shown are t-SNE maps highlighting fluorescence intensity measured by index-sorting to enable simultaneous quantification of the transcriptome and cell surface marker expression for (f) Kit, (g) Sca-1, (h) Flt3, (i) Il7r, and (j) Ly6d. In (f-j) 1,949 cells from n=4 animals are shown. (k) Barplot of Spearman’s correlation coefficient between surface protein expression quantified by fluorescence intensity and mRNA expression measured by single-cell RNA-seq for the same cell. The correlation coefficient was calculated for 1,949 cells from three independent experiments with n = 4 mice.1. Velten, L. et al. Human haematopoietic stem cell lineage commitment is a continuous process. Nat. Cell Biol. 19, 271–281 (2017).2. Paul, F. et al. Transcriptional Heterogeneity and Lineage Commitment in Myeloid Progenitors. Cell 163, 1663–1677 (2015).3. Zheng, G. X. Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).4. Picelli, S. et al. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat. Methods 10, 1096–8 (2013).5. Jaitin, D. A. et al. Massively Parallel Single-Cell RNA-Seq for Marker-Free Decomposition of Tissues into Cell Types. Science 343, 776–779 (2014).

Supplementary Figure 4 Expression domains of lineage marker genes.

Shown are t-SNE maps highlighting log2-transformed normalized transcript levels of Kit and Ly6a (multipotency), Flt3 and Il7r (lymphoid lineage), Rag1 and Ebf1 (B cell lineage), Irf8 and Siglech (plasmacytoid dendritic cells), Cd79a (preB cells), Cd74 and Itgax (conventional dendritic cells), Mpo (granulocyte/monocyte lineage), Elane (neutrophils), Car2, Gata1 and Hbb-bs (erythrocyte lineage), Thy1 (innate lymphoid cells), Ncr1 (natural killer cells), Icos and Gata3 (innate lymphoid helper cell type 2 lineage), Cd3d (NKT cells), and Pf4 (megakaryocyte lineage). All t-SNE maps show 1,949 cells from three independent experiments with n = 4 mice.

Supplementary Figure 5 Benchmarking of RaceID3.

RaceID3 shows superior performance in recovering domains of marker gene expression in comparison to RCA1, SC32, and Seurat3, and ICGS4. RaceID3 was run with random forests-based reclassification (rf) and without. All clustering methods except for ICGS were run with different parameters to change sensitivity and obtain different cluster numbers (see Online methods). By this strategy, overlapping ranges of cluster numbers were obtained for each method. Only ICGS does not have a parameter to allow adjustment of the sensitivity. Shown is the maximum log2-transformed fold-enrichment (left panel) and the entropy of the distribution of average mean expressions (right panel) of a given lineage marker gene across all clusters detected as a function of cluster number. In comparison to all other tested methods, the fold-enrichment of the RaceID3 predictions is substantially higher for most lineage markers and of similar magnitude to the best performing methods for the other ones. At the same time, the entropy as a function of cluster number is consistently lower for most marker genes and follows a similar trend for the remaining ones, when comparing RaceID3 to the other methods. In conclusion, the benchmarking demonstrates that RaceID3 optimizes the overlap of known marker gene expression domains with predicted cell types.1. Li, H. et al. Reference component analysis of single-cell transcriptomes elucidates cellular heterogeneity in human colorectal tumors. Nat. Genet. 49, 708–718 (2017).2. Kiselev, V. Y. et al. SC3: consensus clustering of single-cell RNA-seq data. Nat. Methods 14, 483–486 (2017).3. Satija, R., Farrell, J. A., Gennert, D., Schier, A. F. & Regev, A. Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33, 495–502 (2015).4. Olsson, A. et al. Single-cell analysis of mixed-lineage states leading to a binary cell fate choice. Nature 537, 698–702 (2016).

Supplementary Figure 6 FateID analysis of mouse hematopoietic progenitors.

(a, b) The fate bias, corresponding to the probability of a cell to be assigned to a given lineage, is color-coded in the t-SNE map. The fate bias predicted by FateID (left) and STEMNET1 (middle) along with log2-transformed aggregated normalized expression of two lineage markers. Fate bias and marker gene expression is shown for (a) the conventional dendritic cell, and (b) the innate lymphoid lineage. In (a) and (b) data for 1,802 cells from n=4 animals are shown. (c) Scatterplot of normalized expression levels of Kit and Mpo. The predicted neutrophil fate bias is color-coded. Fate bias increases with Mpo expression and is inversely related to the level of Kit. (d) Scatterplot of normalized expression levels of Kit and Tcf4. The predicted pDC fate bias is color-coded. Fate bias increases with Tcf4 expression and is inversely related to the level of Kit. (e) Scatterplot of normalized expression levels of Kit and Ebf1. The predicted B cell fate bias is color-coded. Fate bias increases with Ebf1 expression and is inversely related to the level of Kit. (f) Scatterplot of normalized expression levels of Kit and Car2. The predicted erythrocyte fate bias is color-coded. Fate bias increases with Car2 expression and is inversely related to the level of Kit. (g) Scatterplot of normalized expression levels of Kit and Cd74. The predicted cDC fate bias is color-coded. Fate bias increases with Cd74 expression and is inversely related to the level of Kit. (h) Scatterplot of normalized expression levels of Kit and Tcf7. The predicted NK/NKT/ILC2 fate bias is color-coded. Fate bias increases with Tcf7 expression and is inversely related to the level of Kit. (i-n) Scatterplots comparing fate bias predicted by FateID and STEMNET for (i) the neutrophil, (j) the pDC, (k) the B cell, (l) the erythrocyte, (m) the cDC, and (n) the NK/NKT/ILC lineage. Although the predictions are overall correlated, STEMNET predicts rather uniform levels across a large fraction of the multipotent cell population, suggesting that each cell is multipotent, while FateID predictions sample the full range of possible values for all lineages, comprising cells with zero fate probability, intermediate values, or more substantial bias towards a given lineage. The higher resolution of FateID is potentially explained by the dynamic composition of the training set, which comprises more mature cells during the first iterations, while it includes earlier differentiation stages at later iterations to classify more naïve cells. In contrast to STEMNET, this strategy avoids classifying naïve cells with the help of genes expressed only at late stages of differentiation. In all panels data for 1,802 cells from three independent experiments with n = 4 mice are shown.1. Velten, L. et al. Human haematopoietic stem cell lineage commitment is a continuous process. Nat. Cell Biol. 19, 271–281 (2017).

Supplementary Figure 7 Classifier genes depend on differentiation stage.

(a-e) The heatmaps show genes with a random forests importance measure >0.02 and a ratio between the absolute importance and its standard deviation >2 for at least a single iteration. Iterations are depicted on the x-axis with the first iteration to the left and the final iteration to the right. Early iterations correspond to more mature stages while late iterations correspond to more naïve stages. A hierarchical clustering dendogram is indicated on the left margin. Heatmaps are shown for (a) the neutrophil, (b) the pDC, (c) the erythrocyte, (d) the cDC, and (e) the NK/NKT/ILC lineage. (f) Heatmap of Spearman’s correlation coefficient between the fate bias predicted by FateID and cell surface marker expression. An elastic-net regularized linear regression by the normal family as used by Velten et al.1 confirms the same trends, but the correlation-based analysis better discriminates the sub-population corresponding to distinct lineages. In (a-f) data derived from 1,802 cells from three independent experiments with n = 4 mice are shown.1. Velten, L. et al. Human haematopoietic stem cell lineage commitment is a continuous process. Nat. Cell Biol. 19, 271–281 (2017).

Supplementary Figure 8 FateID identifies progenitor stages of the pDC lineage.

(a) Self-organizing map (SOM) of z-score-transformed pseudo-temporal expression profiles along the pDC developmental trajectory derived from the t-SNE map in Fig. 3a. Example profiles are shown for four genes dynamically expressed during pDC differentiation. The black line indicates a local regression. The SOM has been computed for 711 cells with predicted pDC fate bias >0.15 derived from three independent experiments with n = 4 mice. (b) Shown are t-SNE maps highlighting normalized transcript levels of Il7r, Cd34, Csf1r and Ly6d for 1,802 cells from three independent experiments with n = 4 mice.

Supplementary Figure 9 FateID reveals fate bias in myeloid progenitors.

(a) A t-SNE map based on transcriptome similarity highlighting the origin of each cell is shown. The published dataset1 comprises common myeloid progenitors (CMP), Irf8-GFP+MHC-II+ dendritic cell progenitors, Flt3+Csf1r+ monocyte progenitors, and Cd41+ megakaryocyte progenitors sorted from the CMP gate. (b) Shown is a t-SNE map highlighting clusters of cells with similar transcriptomes derived by RaceID3. Clusters 17, 1, 11, 5, and 8 were used as FateID target clusters for the erythrocyte, megakaryocyte, dendritic, monocyte, and granulocyte lineage. (c-g) The fate bias, corresponding to the probability of a cell to be assigned to a given lineage, is color-coded in the t-SNE map. The fate bias predicted by FateID (left) and STEMNET2 (middle) is shown along with log2-transformed normalized expression of a lineage marker. Fate bias and marker gene expression is shown for (c) the megakaryocyte, (d) the dendritic, (e) the granulocyte, (f) the monocyte, and (g) the erythrocyte lineage. (h) Barplot comparing Spearman’s correlation coefficient between the expression levels of early lineage markers and fate bias computed by FateID and STEMNET. Error bars correspond to standard errors of Fisher’s z-transformed correlation values calculated across all cells after removal of target clusters (1,927 cells from n=4 animals). P-values were derived from the difference of z-scores divided by the standard error assuming a standard normal distribution using William’s test (*P < 0.05, **P < 0.001). (i) Depicted is a t-SNE map highlighting expression of Flt3, showing that Flt3 does not discriminate between the monocyte and the dendritic cell lineage. (j-n) Shown are scatterplots comparing fate bias predicted by FateID and STEMNET for (j) the megakaryocyte, (k) the monocyte, (l) the granulocyte, (m) the dendritic cell, and (n) the erythrocyte lineage. Although the predictions are overall correlated, STEMNET predicts more uniform levels across a larger fraction of the multipotent cell population. In (a-g) and (i-n) for 2,370 cells from four independent experiments with n = 4 mice are shown.1. Paul, F. et al. Transcriptional Heterogeneity and Lineage Commitment in Myeloid Progenitors. Cell 163, 1663–1677 (2015).2. Velten, L. et al. Human haematopoietic stem cell lineage commitment is a continuous process. Nat. Cell Biol. 19, 271–281 (2017).

Supplementary Figure 10 FateID reveals fate bias of intestinal epithelial progenitors.

(a) Shown is a t-SNE map highlighting clusters of cells with similar transcriptomes derived by RaceID3 on single-cell RNA-seq data of intestinal epithelial cells1. (b) Heatmap of log2-transformed averaged normalized expression across clusters. The cluster number and color are indicated on the right. Only clusters with >3 cells were included. A hierarchical clustering dendogram is shown on the right margin. (c-g) The fate bias, corresponding to the probability of a cell to be assigned to a given lineage, is color-coded in the t-SNE map. The fate bias predicted by FateID (left) and STEMNET2 (middle) is shown along with log2-transformed aggregated normalized expression of two lineage markers. Fate bias and marker gene expression is shown for the (c) Paneth cell, (d) the goblet cell, (e) the enteroendocrine, (f) the enterocyte, and (g) the tuft cell lineage. In (a-g) data for 505 cells from n=3 animals are shown. (h) Barplot comparing Spearman’s correlation coefficient between the expression levels of early lineage markers and fate bias computed by FateID and STEMNET. Error bars correspond to standard errors of Fisher’s z-transformed correlation values calculated across all cells after removal of target clusters (303 cells from three independent experiments with n = 3 mice). P-values were derived from the difference of z-scores divided by the standard error assuming a standard normal distribution using William’s test (*P < 0.05, **P < 0.001). (i-l) Shown is a t-SNE maps highlighting log2-transfromed normalized transcript levels of (i) Neurog3, (j) Neurod1, (k) Muc2, and (l) Clca4. (m-q) Scatterplots comparing fate bias predicted by FateID and STEMNET for (m) the Paneth cell, (n) the goblet cell, (o) the enteroendocrine cell, (p) the enterocyte cell, and (q) the tuft cell lineage. Although the predictions are overall correlated, STEMNET predicts more uniform levels across a larger fraction of the multipotent cell population. In (a-g) and (i-q) data for 505 cells from three independent experiments with n = 3 mice are shown.1. Grün, D. et al. De Novo Prediction of Stem Cell Identity using Single-Cell Transcriptome Data. Cell Stem Cell 19, 266–277 (2016).2. Velten, L. et al. Human haematopoietic stem cell lineage commitment is a continuous process. Nat. Cell Biol. 19, 271–281 (2017).

Supplementary Figure 11 Monocle 2 identifies major branches.

(a) Lineage tree inferred by Monocle 21 using reverse graph embedding. See Online methods for details. Cell types identified based on marker genes are highlighted. (b) Same as (a) but clusters inferred by Monocle 2 are highlighted. (c) Shown are Monocle 2 derived lineage trees highlighting log2-transformed normalized transcript levels of Kit and Ly6a (multipotency), Dntt (lymphoid lineage), Ebf1 (B cell lineage), Siglech (plasmacytoid dendritic cells), Cd74 (conventional dendritic cells), Elane (neutrophils), Gata1 (erythrocyte lineage) and Thy1 (innate lymphoid cells). Monocle 2, which is an established method for the derivation of multi-branched lineage trees failed to resolve Thy1-postive innate lymphocyte progenitors from B cell progenitors and distributed pDC progenitors across several branches. Furthermore, the fixed assignment of a cell to a branch does not predict residual bias to one or more alternative lineages. A probabilistic view on the process of cell fate decision is more appropriate for capturing transitions between cell states biased towards distinct fates, and quantification of co-existing multi-lineage bias is not achieved by available algorithms for the prediction of lineage trees. Monocle 2 results are shown for 1,802 cells from three independent experiments with n = 4 mice.1. Qiu, X. et al. Reversed graph embedding resolves complex single-cell trajectories. Nat. Methods 14, 979–982 (2017).

Supplementary Figure 12 Predicting the fate bias of human hematopoietic progenitors with FateID.

For this analysis, single cell RNA-seq data generated by Smart-seq2 for individual 1 from Velten et al.1 were used. (a) t-SNE map showing clusters of cells with similar transcriptomes derived by RaceID3. (b) Heatmap of log2-transformed averaged normalized expression of known marker genes across clusters. The cluster number and color are indicated on the right. Only clusters with >3 cells were included. A hierarchical clustering dendogram is shown on the right margin. (c) Correlation of predicted fate bias and cell surface marker expression measured by index-sorting for FateID (left) and STEMNET (right). Surface expression of CD135 (encoded by FLT3) and CD45RA discriminates progenitors of neutrophils, monocytes, pDCs, and lymphocytes, on the one hand, and eosinophils/basophils/mast cells, erythrocytes and megakaryocytes, on the other hand. Surface expression of the two markers is positively correlated to fate bias towards the former group of lineages and inversely correlated to fate bias towards the latter group of lineages. FateID predictions show a more pronounced difference between the two groups. (d) Shown are t-SNE maps highlighting log-transformed fluorescence intensity measured by flow cytometry (index-sorting) for CD135 (top) and CD35RA (bottom). (e) The fate bias predicted by FateID, corresponding to the probability of a cell to be assigned to a given lineage, is color-coded in the t-SNE map. The fate bias is shown for the B cell lineage (left) and the pDC lineage (right). The black circle marks a population of cells with enhanced fate bias towards pDCs or B cells (f) Shown are t-SNE maps highlighting log-transformed normalized transcript expression of the B cell lineage marker VPREB1 (left) and the pDC lineage marker IRF8 (right). The expression domains of these markers overlap with the predicted domain of fate bias towards the respective lineage and lymphoid progenitors with enhanced bias towards the pDC lineage exhibit co-expression of the two markers (black circle). (g) The fate bias predicted by STEMNET shown for the B cell lineage (left) and the pDC lineage (right). Black circle: see (f). (h) Scatterplots comparing fate bias predicted by FateID and STEMNET for the pDC lineage (left) and the B cell lineage (right). Although the predictions are overall correlated (Spearman’s correlation coefficient is 0.74 for the B cell lineage and 0.67 for the pDC lineage), STEMNET predicts more uniform level across a larger fraction of progenitors. All panels show data from 1,035 cells sequenced from n = 1 donor.1. Velten, L. et al. Human haematopoietic stem cell lineage commitment is a continuous process. Nat. Cell Biol. 19, 271–281 (2017).

Supplementary Figure 13 Sorting strategy for in vitro differentiation of B cell and pDC progenitors.

(a) Sorting strategy for B cell versus plasmacytoid dendritic cell biased progenitors and staining of cultured progenitors. Cells were gated using SSC-A versus FSC-A (not shown). Only single cells were considered further on using SSC-W versus SSC-H gating followed by FSC-W versus FSC-H gating (not shown). Thereupon only lineage negative (Lin) cells were considered by exclusion of cells positive for the lineage markers TER-119, B220, Cd11b, Gr-1, SiglecH, Cd19 and Cd3ɛ (FITC). The next gating included only Flt3 positive (Flt3+) cells. The Flt3+ cells were plotted using Kit (BV510) versus Sca-1 (BV650) and cells with intermediate expression of Kit and Sca-1 were considered (common lymphoid progenitors, CLP). Finally CLP cells were plotted using IL-7R (BV421) and CD34 (PE). For the culture experiments and for single cell sequencing stringent gates were used to sort Cd34-IL7r+, Cd34+Il7r+ and Cd34+Il7r- cells. In addition, lymphoid-primed multipotent progenitors (LMPP) were sorted for culturing and for single cell sequencing. Fluorescence minus one controls were used for CD34 (PE) to set the sorting gates appropriately (not shown). The threshhold for Il7r-positive cells was set according to the unstained control (not shown). (b) Exemplary flow cytometry plots of progenitors from one mouse after 7 days of culturing in either B cell medium or plasmacytoid dendritic cell (pDC) medium. The surface marker expression of Siglech and Cd19 was assessed to check for lineage commitment towards the pDC or B cell lineage. Independent experiments were performed for n = 5 animals. Data for flow cytometry plot (a) is from one experiment with n = 3 mice, whereas data for flow cytometry plot (b) is derived from one experiment with n = 1 mouse.

Supplementary Figure 14 Revised model of hematopoietic differentiation.

(a) t-SNE map with a principal curve fitted to all cells within the progenitor clusters (1,2,3,7). (b) Heatmap of z-score transformed pseudo-temporal expression profiles of a number of multipotency and lineage marker genes. Cells were ordered along the principal curve in (a) and profiles were smoothened by a local regression. The bottom panel depicts a local regression of the fate bias using the same temporal ordering. Successive expression of multipotency markers Kit, Ly6a, Ifitm1, Cd34, Cd48, and Flt3 is consistent with the ordering of multipotent progenitor (MPP) stages MPP1 to MPP4 as previously inferred by bulk measurements1,2. Lineage markers comprise Gata2, Car2, Gata1, Pf4 for the erythrocyte/megakaryocyte lineage, Cebpa, Csf1r, Mpo for the granulocyte and monocyte lineage, Itgax for the conventional dendritic cell lineage, and Irf8, Tcf4 for the plasmacytoid dendritic cell lineage, and Il7r, Rag1, Ebf1, Dntt for the B cell lineage. (c) Pictorial representation of the derived lineage tree. In (a) and (c) data are shown for 1,802 cells from 3 independent experiments with n = 4 mice. In (b) profiles are shown for 1,416 cells from three independent experiments with n = 4 mice. (d) Hematopoietic lineage tree inferred by StemID2. Only significant links are shown (P<0.01). The color of the link indicates the -log10p-value. The color of the vertices indicates the entropy. The thickness indicates the link score reflecting how densely a link is covered with cells. The lineage tree is consistent with independently derived fate bias estimates: erythrocytes (cluster 9) branch off from cluster 1, while cluster 2 and 3 give rise to granulocytes/monocytes (cluster 17), and cluster 2 and 7 comprise progenitors of pDCs (cluster 12) and B cells (cluster 10). (e) Barplot of StemID2 scores for hematopoietic clusters. Cluster 1, which shows highest expression of HSC markers, such as Ifitm12, receives the highest StemID2 score. In (a) and (b), only clusters with >5 cells were included and a link score cut-off of 0.5 was applied. The StemID2 in (d) and (e) computation has been performed on 1,802 cells from three independent experiments with n = 4 mice.1. Wilson, A. et al. Hematopoietic Stem Cells Reversibly Switch from Dormancy to Self-Renewal during Homeostasis and Repair. Cell 135, 1118–1129 (2008).2. Cabezas-Wallscheid, N. et al. Identification of Regulatory Networks in HSCs and Their Immediate Progeny via Integrated Proteome, Transcriptome, and DNA Methylome Analysis. Cell Stem Cell 15, 507–522 (2014).

Supplementary Figure 15 FateID recovers bipotent progenitor of monocytes and neutrophils.

(a) t-SNE map of the clusters identified by RaceID3 analysis on the murine hematopoietic transcriptome data from Olsson et al.1. The cell types described in the original study were recovered and are indicated next to the clusters. (b) t-SNE maps highlighting normalized expression of lineage-specific markers. (c) FateID fate bias predictions for all lineages. (d) Scatterplot for the comparison of fate bias towards the monocyte and the neutrophil lineage. Aggregated marker gene expression is highlighted for monocytes (left; Irf8, Csf1r, Ly86), for neutrophils (middle; Gfi1, Cebpe, S100a8) and the bi-potential progenitor (right; Ctsg, Elane, Mpo). The markers are taken from Figure 1c of Olsson et al.1. The FateID analysis reveals that lineage-specific marker gene expression coincides with uni-lineage bias, while progenitors with similar fate biases for monocytes and neutrophils express markers, which are expressed in both of the lineages as well as low levels of the lineage-specific markers. This is consistent with the interpretation of Olsson et al., that these cells represent bi-potent progenitors. We note that the bi-potency of this transitional state was validated by in vitro differentiation assays in the original study. All panels show data of 382 cells sequenced from n = 3 mice.1. Olsson, A. et al. Single-cell analysis of mixed-lineage states leading to a binary cell fate choice. Nature 537, 698–702 (2016).

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–15 and Supplementary Notes 1–5

Life Sciences Reporting Summary

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Herman, J., Sagar & Grün, D. FateID infers cell fate bias in multipotent progenitors from single-cell RNA-seq data. Nat Methods 15, 379–386 (2018) doi:10.1038/nmeth.4662

Download citation

Further reading