Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

FateID infers cell fate bias in multipotent progenitors from single-cell RNA-seq data

Abstract

To understand stem cell differentiation along multiple lineages, it is necessary to resolve heterogeneous cellular states and the ancestral relationships between them. We developed a robotic miniaturized CEL-Seq2 implementation to carry out deep single-cell RNA-seq of 2,000 mouse hematopoietic progenitors enriched for lymphoid lineages, and used an improved clustering algorithm, RaceID3, to identify cell types. To resolve subtle transcriptome differences indicative of lineage biases, we developed FateID, an iterative supervised learning algorithm for the probabilistic quantification of cell fate bias in progenitor populations. Here we used FateID to delineate domains of fate bias and enable the derivation of high-resolution differentiation trajectories, thereby revealing a common progenitor population of B cells and plasmacytoid dendritic cells, which we validated by in vitro differentiation assays. We expect that FateID will improve understanding of the process of cell fate choice in complex multi-lineage differentiation systems.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Elucidating transcriptome heterogeneity of multipotent hematopoietic progenitors by scRNA-seq.
Figure 2: FateID quantifies lineage bias in multipotent progenitor populations.
Figure 3: The importance of genes for lineage classification depends on the differentiation stage.
Figure 4: FateID identifies a common progenitor population of B cells and pDCs.
Figure 5: In vitro differentiation assay confirmed predicted lymphoid progenitor populations of pDCs and B cells.

Similar content being viewed by others

Accession codes

Primary accessions

Gene Expression Omnibus

References

  1. Paul, F. et al. Transcriptional heterogeneity and lineage commitment in myeloid progenitors. Cell 163, 1663–1677 (2015).

    Article  CAS  Google Scholar 

  2. Velten, L. et al. Human haematopoietic stem cell lineage commitment is a continuous process. Nat. Cell Biol. 19, 271–281 (2017).

    Article  CAS  Google Scholar 

  3. Olsson, A. et al. Single-cell analysis of mixed-lineage states leading to a binary cell fate choice. Nature 537, 698–702 (2016).

    Article  CAS  Google Scholar 

  4. Nestorowa, S. et al. A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. Blood 128, e20–e31 (2016).

    Article  CAS  Google Scholar 

  5. Drissen, R. et al. Distinct myeloid progenitor–differentiation pathways identified through single-cell RNA sequencing. Nat. Immunol. 17, 666–676 (2016).

    Article  CAS  Google Scholar 

  6. Perié, L., Duffy, K.R., Kok, L., de Boer, R.J. & Schumacher, T.N. The branching point in erythro-myeloid differentiation. Cell 163, 1655–1662 (2015).

    Article  Google Scholar 

  7. Busch, K. et al. Fundamental properties of unperturbed haematopoiesis from stem cells in vivo. Nature 518, 542–546 (2015).

    Article  CAS  Google Scholar 

  8. Yu, V.W.C. et al. Epigenetic memory underlies cell-autonomous heterogeneous behavior of hematopoietic stem cells. Cell 167, 1310–1322 (2016).

    Article  CAS  Google Scholar 

  9. Orkin, S.H. & Zon, L.I. Hematopoiesis: an evolving paradigm for stem cell biology. Cell 132, 631–644 (2008).

    Article  CAS  Google Scholar 

  10. Haghverdi, L., Büttner, M., Wolf, F.A., Buettner, F. & Theis, F.J. Diffusion pseudotime robustly reconstructs lineage branching. Nat. Methods 13, 845–848 (2016).

    Article  CAS  Google Scholar 

  11. Setty, M. et al. Wishbone identifies bifurcating developmental trajectories from single-cell data. Nat. Biotechnol. 34, 637–645 (2016).

    Article  CAS  Google Scholar 

  12. Marco, E. et al. Bifurcation analysis of single-cell gene expression data reveals epigenetic landscape. Proc. Natl. Acad. Sci. USA 111, E5643–E5650 (2014).

    Article  CAS  Google Scholar 

  13. Chen, J., Schlitzer, A., Chakarov, S., Ginhoux, F. & Poidinger, M. Mpath maps multi-branching single-cell trajectories revealing progenitor cell progression during development. Nat. Commun. 7, 11988 (2016).

    Article  CAS  Google Scholar 

  14. Hashimshony, T. et al. CEL-Seq2: sensitive highly-multiplexed single-cell RNA-Seq. Genome Biol. 17, 77 (2016).

    Article  Google Scholar 

  15. Svensson, V. et al. Power analysis of single-cell RNA-sequencing experiments. Nat. Methods 14, 381–387 (2017).

    Article  CAS  Google Scholar 

  16. Ziegenhain, C. et al. Comparative analysis of single-cell RNA sequencing methods. Mol. Cell 65, 631–643 (2017).

    Article  CAS  Google Scholar 

  17. Zheng, G.X.Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).

    Article  CAS  Google Scholar 

  18. Grün, D. et al. De novo prediction of stem cell identity using single-cell transcriptome data. Cell Stem Cell 19, 266–277 (2016).

    Article  Google Scholar 

  19. Satija, R., Farrell, J.A., Gennert, D., Schier, A.F. & Regev, A. Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33, 495–502 (2015).

    Article  CAS  Google Scholar 

  20. Macosko, E.Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015).

    Article  CAS  Google Scholar 

  21. Kiselev, V.Y. et al. SC3: consensus clustering of single-cell RNA-seq data. Nat. Methods 14, 483–486 (2017).

    Article  CAS  Google Scholar 

  22. Li, H. et al. Reference component analysis of single-cell transcriptomes elucidates cellular heterogeneity in human colorectal tumors. Nat. Genet. 49, 708–718 (2017).

    Article  CAS  Google Scholar 

  23. Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).

    Article  Google Scholar 

  24. Satpathy, A.T., Wu, X., Albring, J.C. & Murphy, K.M. Re(de)fining the dendritic cell lineage. Nat. Immunol. 13, 1145–1154 (2012).

    Article  CAS  Google Scholar 

  25. Poltorak, M.P. & Schraml, B.U. Fate mapping of dendritic cells. Front. Immunol. 6, 199 (2015).

    Article  Google Scholar 

  26. Welner, R.S. et al. Asynchronous RAG-1 expression during B lymphopoiesis. J. Immunol. 183, 7768–7777 (2009).

    Article  CAS  Google Scholar 

  27. Corcoran, L. et al. The lymphoid past of mouse plasmacytoid cells and thymic dendritic cells. J. Immunol. 170, 4926–4932 (2003).

    Article  CAS  Google Scholar 

  28. Inlay, M.A. et al. Ly6d marks the earliest stage of B-cell specification and identifies the branchpoint between B-cell and T-cell development. Genes Dev. 23, 2376–2381 (2009).

    Article  CAS  Google Scholar 

  29. Qiu, X. et al. Reversed graph embedding resolves complex single-cell trajectories. Nat. Methods 14, 979–982 (2017).

    Article  CAS  Google Scholar 

  30. Pietras, E.M. et al. Functionally distinct subsets of lineage-biased multipotent progenitors control blood production in normal and regenerative conditions. Cell Stem Cell 17, 35–46 (2015).

    Article  CAS  Google Scholar 

  31. Notta, F. et al. Distinct routes of lineage development reshape the human blood hierarchy across ontogeny. Science 351, aab2116 (2016).

    Article  Google Scholar 

  32. Cabezas-Wallscheid, N. et al. Identification of regulatory networks in HSCs and their immediate progeny via integrated proteome, transcriptome, and DNA methylome analysis. Cell Stem Cell 15, 507–522 (2014).

    Article  CAS  Google Scholar 

  33. Sathe, P., Vremec, D., Wu, L., Corcoran, L. & Shortman, K. Convergent differentiation: myeloid and lymphoid pathways to murine plasmacytoid dendritic cells. Blood 121, 11–19 (2013).

    Article  CAS  Google Scholar 

  34. Onai, N. et al. A clonogenic progenitor with prominent plasmacytoid dendritic cell developmental potential. Immunity 38, 943–957 (2013).

    Article  CAS  Google Scholar 

  35. Onai, N. et al. Identification of clonogenic common Flt3+M-CSFR+ plasmacytoid and conventional dendritic cell progenitors in mouse bone marrow. Nat. Immunol. 8, 1207–1216 (2007).

    Article  CAS  Google Scholar 

  36. Medina, K.L. et al. Separation of plasmacytoid dendritic cells from B-cell-biased lymphoid progenitor (BLP) and pre-pro B cells using PDCA-1. PLoS One 8, e78408 (2013).

    Article  CAS  Google Scholar 

  37. Villani, A.-C. et al. Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes, and progenitors. Science 356, eaah4573 (2017).

    Article  Google Scholar 

  38. See, P. et al. Mapping the human DC lineage through the integration of high-dimensional techniques. Science 356, eaag3009 (2017).

    Article  Google Scholar 

  39. Naik, S.H. et al. Development of plasmacytoid and conventional dendritic cell subtypes from single precursor cells derived in vitro and in vivo. Nat. Immunol. 8, 1217–1226 (2007).

    Article  CAS  Google Scholar 

  40. Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595 (2010).

    Article  Google Scholar 

  41. Grün, D., Kester, L. & van Oudenaarden, A. Validation of noise models for single-cell transcriptomics. Nat. Methods 11, 637–640 (2014).

    Article  Google Scholar 

  42. Grün, D. et al. Single-cell messenger RNA sequencing reveals rare intestinal cell types. Nature 525, 251–255 (2015).

    Article  Google Scholar 

  43. Ritchie, M.E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).

    Article  Google Scholar 

  44. Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010).

    Article  CAS  Google Scholar 

  45. van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2570–2605 (2008).

    Google Scholar 

Download references

Acknowledgements

We thank A. Pospisilik for help in developing mCEL-Seq2. We acknowledge extensive support from S. Hobitz, K. Schuldes, and D. Wild in flow cytometry, and U. Boenisch in deep sequencing. We also thank T. Boehm, C. Happe, and R. Grosschedl for valuable input and support. We thank T. Boehm, N. Cabezas-Wallscheid, and E. Trompouki for critical reading of the manuscript and valuable feedback. S. acknowledges funding from the Behrens-Weise Foundation. This work was supported by the Max Planck Society.

Author information

Authors and Affiliations

Authors

Contributions

J.S.H. performed all experiments, analyzed the data, and created the web interface; S. established the mCEL-Seq2 protocol with help of J.S.H.; D.G. designed the study, developed the algorithm, and wrote the paper; and all authors edited the manuscript.

Corresponding author

Correspondence to Dominic Grün.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Miniaturized high-throughput version of CEL-Seq2 maintains high sensitivity and accuracy.

(a-e) Violin plots showing a comparison between manual CEL-Seq2 and the robotic version at different volume reductions for the distributions of (a) the number of transcripts per mESC, (b) the fraction of recovered ERCC1 spike-in RNA, (c) the number of genes per mESC, (d) Pearson’s correlation coefficient between measured and actual spike-in numbers, and (e) Pearson’s correlation coefficient between spike-in levels measured in distinct cells. In (a-e) violin plots represent the density distribution of the data. Overlaid boxplots show the median (white dot) and the interquartile range (box limits). The whiskers extend to the most extreme data point, within 1.5 times the interquartile range from the box. Outliers are indicated. The sample size in (a-e) 48 cells for each group from n=1 experiments. (f, g) Dependence of sequencing efficiency on sequence composition. A regression was calculated of the average sequenced spike-in number on the actual spike-in number, setting the intercept to zero. The scatter plots show the dependence of the deviation of the measured spike-in level from the regression line, normalized by the average expression, on (f) percentage GC content and (g) sequence length. Data points for 96 ERCC1 spike-in sequences are shown in (f) and (g). Shown data are from one experiment.

1. Baker, S. C. et al. The External RNA Controls Consortium: a progress report. Nat. Methods 2, 731–4 (2005)

Supplementary Figure 2 Gate settings used for index sorting of hematopoietic progenitors.

Two different sorting schemes were used. In a first experiment with n = 2 mice (sorting scheme 1), we purified 384 Lineage (Lin) KithiSca-1hi (LSK) cells with high surface expression of Kit and Sca-1 (encoded by Ly6a), a permissive sorting strategy to sample the pool of HSCs and multipotent progenitors. In addition, we sorted 768 lymphoid-primed multipotent progenitors (LMPP) as Flt3+ LSK cells, and 384 LinKitloSca-1lo as well as 192 LinKitloSca-1lo/−Flt3hi common lymphoid progenitors (CLP) in order to enrich for the lymphoid branch. We deliberately did not gate on Il7r to comprehensively sample CLPs1–3. In order to capture more mature and potentially underrepresented progenitor states of all hematopoietic lineages we subsequently applied a second sorting strategy (sorting scheme 2). Here, we sequenced 1,152 cells from 16 adjacent windows spanning different ranges of Kit and Sca-1 expression and being either Flt3+ or Flt3. Cells in (a) and (b) were sorted from n = 2 mice each in another independent experiment. Flow cytometry data are from one experiment with n = 1 mouse.

1. Inlay, M. A. et al. Ly6d marks the earliest stage of B-cell specification and identifies the branchpoint between B-cell and T-cell development. Genes Dev. 23, 2376–81 (2009).

2. Mansson, R. et al. Single-cell analysis of the common lymphoid progenitor compartment reveals functional and molecular heterogeneity. Blood 115, 2601–2609 (2010).

3. Tsapogas, P. et al. IL-7 mediates Ebf-1-dependent lineage restriction in early lymphoid progenitors. Blood 118, 1283–1290 (2011).

Supplementary Figure 3 Single-cell sequencing of index-sorted lymphoid progenitor populations.

(a) Boxplot of transcript count per cell distribution for LSK cells, LMPPs and CLPs. The sample sizes are 370, 748, and 494 cells for LSK cells, LMPPs, and CLPs, sorted from two independent experiments with n = 2 mice. (b, c) Saturation analysis. (b) The number of UMIs detected in cells of the LMPP sample is shown as a function of the fraction of reads used for analysis. (c) The number of genes detected in cells of the LMPP sample is shown as a function of the fraction of reads used for analysis. (d) To benchmark the quality of our dataset, we compared Cd34+ LMPPs to mouse (and human) Cd34+ (CD34+) datasets generated with different sequencing technologies1–3, including Smart-seq24, the commercial 10x GemCode technology and MARS-seq5. Boxplot of UMI count distribution in Cd34+ (or CD34+) cells from different mouse (or human) hematopoietic progenitor datasets. Herman: Cd34+ LMPPs from this study (725 cells from n = 2 mice). Paul2: Cd34+ cells from the common myeloid progenitor gate sequenced with MARS-seq (2,370 cells from n = 4 mice). Zheng (PB)3: Cd34+ (mRNA-)positive cells from the CD34+ surface protein–positive peripheral blood population generated with 10x GemCode technology (3,213 cells from n=1 donor). Zheng (BM)3: Cd34+ cells from post-transplantation bone-marrow of an acute myeloid leukemia patient (AML027) generated with 10x GemCode technology (290 cells from n = 1 donor). Selection was always based on Cd34 (CD34) mRNA levels. (e) Boxplot of the number of detected genes per cell for the same samples as in (d) and a high-sensitivity dataset sequenced with a non-UMI based full-length coverage technology. Velten1: CD34+ human bone marrow cells from individual 1 sequenced with Smart-seq2 (1,035 cells from n = 1 donor). In (a-e), bold line indicates the median and box limits represent interquartile range. The whiskers extend to the most extreme data point, within 1.5 times the interquartile range from the box. Outliers are indicated. (f-j) Shown are t-SNE maps highlighting fluorescence intensity measured by index-sorting to enable simultaneous quantification of the transcriptome and cell surface marker expression for (f) Kit, (g) Sca-1, (h) Flt3, (i) Il7r, and (j) Ly6d. In (f-j) 1,949 cells from n=4 animals are shown. (k) Barplot of Spearman’s correlation coefficient between surface protein expression quantified by fluorescence intensity and mRNA expression measured by single-cell RNA-seq for the same cell. The correlation coefficient was calculated for 1,949 cells from three independent experiments with n = 4 mice.

1. Velten, L. et al. Human haematopoietic stem cell lineage commitment is a continuous process. Nat. Cell Biol. 19, 271–281 (2017).

2. Paul, F. et al. Transcriptional Heterogeneity and Lineage Commitment in Myeloid Progenitors. Cell 163, 1663–1677 (2015).

3. Zheng, G. X. Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).

4. Picelli, S. et al. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat. Methods 10, 1096–8 (2013).

5. Jaitin, D. A. et al. Massively Parallel Single-Cell RNA-Seq for Marker-Free Decomposition of Tissues into Cell Types. Science 343, 776–779 (2014).

Supplementary Figure 4 Expression domains of lineage marker genes.

Shown are t-SNE maps highlighting log2-transformed normalized transcript levels of Kit and Ly6a (multipotency), Flt3 and Il7r (lymphoid lineage), Rag1 and Ebf1 (B cell lineage), Irf8 and Siglech (plasmacytoid dendritic cells), Cd79a (preB cells), Cd74 and Itgax (conventional dendritic cells), Mpo (granulocyte/monocyte lineage), Elane (neutrophils), Car2, Gata1 and Hbb-bs (erythrocyte lineage), Thy1 (innate lymphoid cells), Ncr1 (natural killer cells), Icos and Gata3 (innate lymphoid helper cell type 2 lineage), Cd3d (NKT cells), and Pf4 (megakaryocyte lineage). All t-SNE maps show 1,949 cells from three independent experiments with n = 4 mice.

Supplementary Figure 5 Benchmarking of RaceID3.

RaceID3 shows superior performance in recovering domains of marker gene expression in comparison to RCA1, SC32, and Seurat3, and ICGS4. RaceID3 was run with random forests-based reclassification (rf) and without. All clustering methods except for ICGS were run with different parameters to change sensitivity and obtain different cluster numbers (see Online methods). By this strategy, overlapping ranges of cluster numbers were obtained for each method. Only ICGS does not have a parameter to allow adjustment of the sensitivity. Shown is the maximum log2-transformed fold-enrichment (left panel) and the entropy of the distribution of average mean expressions (right panel) of a given lineage marker gene across all clusters detected as a function of cluster number. In comparison to all other tested methods, the fold-enrichment of the RaceID3 predictions is substantially higher for most lineage markers and of similar magnitude to the best performing methods for the other ones. At the same time, the entropy as a function of cluster number is consistently lower for most marker genes and follows a similar trend for the remaining ones, when comparing RaceID3 to the other methods. In conclusion, the benchmarking demonstrates that RaceID3 optimizes the overlap of known marker gene expression domains with predicted cell types.

1. Li, H. et al. Reference component analysis of single-cell transcriptomes elucidates cellular heterogeneity in human colorectal tumors. Nat. Genet. 49, 708–718 (2017).

2. Kiselev, V. Y. et al. SC3: consensus clustering of single-cell RNA-seq data. Nat. Methods 14, 483–486 (2017).

3. Satija, R., Farrell, J. A., Gennert, D., Schier, A. F. & Regev, A. Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33, 495–502 (2015).

4. Olsson, A. et al. Single-cell analysis of mixed-lineage states leading to a binary cell fate choice. Nature 537, 698–702 (2016).

Supplementary Figure 6 FateID analysis of mouse hematopoietic progenitors.

(a, b) The fate bias, corresponding to the probability of a cell to be assigned to a given lineage, is color-coded in the t-SNE map. The fate bias predicted by FateID (left) and STEMNET1 (middle) along with log2-transformed aggregated normalized expression of two lineage markers. Fate bias and marker gene expression is shown for (a) the conventional dendritic cell, and (b) the innate lymphoid lineage. In (a) and (b) data for 1,802 cells from n=4 animals are shown. (c) Scatterplot of normalized expression levels of Kit and Mpo. The predicted neutrophil fate bias is color-coded. Fate bias increases with Mpo expression and is inversely related to the level of Kit. (d) Scatterplot of normalized expression levels of Kit and Tcf4. The predicted pDC fate bias is color-coded. Fate bias increases with Tcf4 expression and is inversely related to the level of Kit. (e) Scatterplot of normalized expression levels of Kit and Ebf1. The predicted B cell fate bias is color-coded. Fate bias increases with Ebf1 expression and is inversely related to the level of Kit. (f) Scatterplot of normalized expression levels of Kit and Car2. The predicted erythrocyte fate bias is color-coded. Fate bias increases with Car2 expression and is inversely related to the level of Kit. (g) Scatterplot of normalized expression levels of Kit and Cd74. The predicted cDC fate bias is color-coded. Fate bias increases with Cd74 expression and is inversely related to the level of Kit. (h) Scatterplot of normalized expression levels of Kit and Tcf7. The predicted NK/NKT/ILC2 fate bias is color-coded. Fate bias increases with Tcf7 expression and is inversely related to the level of Kit. (i-n) Scatterplots comparing fate bias predicted by FateID and STEMNET for (i) the neutrophil, (j) the pDC, (k) the B cell, (l) the erythrocyte, (m) the cDC, and (n) the NK/NKT/ILC lineage. Although the predictions are overall correlated, STEMNET predicts rather uniform levels across a large fraction of the multipotent cell population, suggesting that each cell is multipotent, while FateID predictions sample the full range of possible values for all lineages, comprising cells with zero fate probability, intermediate values, or more substantial bias towards a given lineage. The higher resolution of FateID is potentially explained by the dynamic composition of the training set, which comprises more mature cells during the first iterations, while it includes earlier differentiation stages at later iterations to classify more naïve cells. In contrast to STEMNET, this strategy avoids classifying naïve cells with the help of genes expressed only at late stages of differentiation. In all panels data for 1,802 cells from three independent experiments with n = 4 mice are shown.

1. Velten, L. et al. Human haematopoietic stem cell lineage commitment is a continuous process. Nat. Cell Biol. 19, 271–281 (2017).

Supplementary Figure 7 Classifier genes depend on differentiation stage.

(a-e) The heatmaps show genes with a random forests importance measure >0.02 and a ratio between the absolute importance and its standard deviation >2 for at least a single iteration. Iterations are depicted on the x-axis with the first iteration to the left and the final iteration to the right. Early iterations correspond to more mature stages while late iterations correspond to more naïve stages. A hierarchical clustering dendogram is indicated on the left margin. Heatmaps are shown for (a) the neutrophil, (b) the pDC, (c) the erythrocyte, (d) the cDC, and (e) the NK/NKT/ILC lineage. (f) Heatmap of Spearman’s correlation coefficient between the fate bias predicted by FateID and cell surface marker expression. An elastic-net regularized linear regression by the normal family as used by Velten et al.1 confirms the same trends, but the correlation-based analysis better discriminates the sub-population corresponding to distinct lineages. In (a-f) data derived from 1,802 cells from three independent experiments with n = 4 mice are shown.

1. Velten, L. et al. Human haematopoietic stem cell lineage commitment is a continuous process. Nat. Cell Biol. 19, 271–281 (2017).

Supplementary Figure 8 FateID identifies progenitor stages of the pDC lineage.

(a) Self-organizing map (SOM) of z-score-transformed pseudo-temporal expression profiles along the pDC developmental trajectory derived from the t-SNE map in Fig. 3a. Example profiles are shown for four genes dynamically expressed during pDC differentiation. The black line indicates a local regression. The SOM has been computed for 711 cells with predicted pDC fate bias >0.15 derived from three independent experiments with n = 4 mice. (b) Shown are t-SNE maps highlighting normalized transcript levels of Il7r, Cd34, Csf1r and Ly6d for 1,802 cells from three independent experiments with n = 4 mice.

Supplementary Figure 9 FateID reveals fate bias in myeloid progenitors.

(a) A t-SNE map based on transcriptome similarity highlighting the origin of each cell is shown. The published dataset1 comprises common myeloid progenitors (CMP), Irf8-GFP+MHC-II+ dendritic cell progenitors, Flt3+Csf1r+ monocyte progenitors, and Cd41+ megakaryocyte progenitors sorted from the CMP gate. (b) Shown is a t-SNE map highlighting clusters of cells with similar transcriptomes derived by RaceID3. Clusters 17, 1, 11, 5, and 8 were used as FateID target clusters for the erythrocyte, megakaryocyte, dendritic, monocyte, and granulocyte lineage. (c-g) The fate bias, corresponding to the probability of a cell to be assigned to a given lineage, is color-coded in the t-SNE map. The fate bias predicted by FateID (left) and STEMNET2 (middle) is shown along with log2-transformed normalized expression of a lineage marker. Fate bias and marker gene expression is shown for (c) the megakaryocyte, (d) the dendritic, (e) the granulocyte, (f) the monocyte, and (g) the erythrocyte lineage. (h) Barplot comparing Spearman’s correlation coefficient between the expression levels of early lineage markers and fate bias computed by FateID and STEMNET. Error bars correspond to standard errors of Fisher’s z-transformed correlation values calculated across all cells after removal of target clusters (1,927 cells from n=4 animals). P-values were derived from the difference of z-scores divided by the standard error assuming a standard normal distribution using William’s test (*P < 0.05, **P < 0.001). (i) Depicted is a t-SNE map highlighting expression of Flt3, showing that Flt3 does not discriminate between the monocyte and the dendritic cell lineage. (j-n) Shown are scatterplots comparing fate bias predicted by FateID and STEMNET for (j) the megakaryocyte, (k) the monocyte, (l) the granulocyte, (m) the dendritic cell, and (n) the erythrocyte lineage. Although the predictions are overall correlated, STEMNET predicts more uniform levels across a larger fraction of the multipotent cell population. In (a-g) and (i-n) for 2,370 cells from four independent experiments with n = 4 mice are shown.

1. Paul, F. et al. Transcriptional Heterogeneity and Lineage Commitment in Myeloid Progenitors. Cell 163, 1663–1677 (2015).

2. Velten, L. et al. Human haematopoietic stem cell lineage commitment is a continuous process. Nat. Cell Biol. 19, 271–281 (2017).

Supplementary Figure 10 FateID reveals fate bias of intestinal epithelial progenitors.

(a) Shown is a t-SNE map highlighting clusters of cells with similar transcriptomes derived by RaceID3 on single-cell RNA-seq data of intestinal epithelial cells1. (b) Heatmap of log2-transformed averaged normalized expression across clusters. The cluster number and color are indicated on the right. Only clusters with >3 cells were included. A hierarchical clustering dendogram is shown on the right margin. (c-g) The fate bias, corresponding to the probability of a cell to be assigned to a given lineage, is color-coded in the t-SNE map. The fate bias predicted by FateID (left) and STEMNET2 (middle) is shown along with log2-transformed aggregated normalized expression of two lineage markers. Fate bias and marker gene expression is shown for the (c) Paneth cell, (d) the goblet cell, (e) the enteroendocrine, (f) the enterocyte, and (g) the tuft cell lineage. In (a-g) data for 505 cells from n=3 animals are shown. (h) Barplot comparing Spearman’s correlation coefficient between the expression levels of early lineage markers and fate bias computed by FateID and STEMNET. Error bars correspond to standard errors of Fisher’s z-transformed correlation values calculated across all cells after removal of target clusters (303 cells from three independent experiments with n = 3 mice). P-values were derived from the difference of z-scores divided by the standard error assuming a standard normal distribution using William’s test (*P < 0.05, **P < 0.001). (i-l) Shown is a t-SNE maps highlighting log2-transfromed normalized transcript levels of (i) Neurog3, (j) Neurod1, (k) Muc2, and (l) Clca4. (m-q) Scatterplots comparing fate bias predicted by FateID and STEMNET for (m) the Paneth cell, (n) the goblet cell, (o) the enteroendocrine cell, (p) the enterocyte cell, and (q) the tuft cell lineage. Although the predictions are overall correlated, STEMNET predicts more uniform levels across a larger fraction of the multipotent cell population. In (a-g) and (i-q) data for 505 cells from three independent experiments with n = 3 mice are shown.

1. Grün, D. et al. De Novo Prediction of Stem Cell Identity using Single-Cell Transcriptome Data. Cell Stem Cell 19, 266–277 (2016).

2. Velten, L. et al. Human haematopoietic stem cell lineage commitment is a continuous process. Nat. Cell Biol. 19, 271–281 (2017).

Supplementary Figure 11 Monocle 2 identifies major branches.

(a) Lineage tree inferred by Monocle 21 using reverse graph embedding. See Online methods for details. Cell types identified based on marker genes are highlighted. (b) Same as (a) but clusters inferred by Monocle 2 are highlighted. (c) Shown are Monocle 2 derived lineage trees highlighting log2-transformed normalized transcript levels of Kit and Ly6a (multipotency), Dntt (lymphoid lineage), Ebf1 (B cell lineage), Siglech (plasmacytoid dendritic cells), Cd74 (conventional dendritic cells), Elane (neutrophils), Gata1 (erythrocyte lineage) and Thy1 (innate lymphoid cells). Monocle 2, which is an established method for the derivation of multi-branched lineage trees failed to resolve Thy1-postive innate lymphocyte progenitors from B cell progenitors and distributed pDC progenitors across several branches. Furthermore, the fixed assignment of a cell to a branch does not predict residual bias to one or more alternative lineages. A probabilistic view on the process of cell fate decision is more appropriate for capturing transitions between cell states biased towards distinct fates, and quantification of co-existing multi-lineage bias is not achieved by available algorithms for the prediction of lineage trees. Monocle 2 results are shown for 1,802 cells from three independent experiments with n = 4 mice.

1. Qiu, X. et al. Reversed graph embedding resolves complex single-cell trajectories. Nat. Methods 14, 979–982 (2017).

Supplementary Figure 12 Predicting the fate bias of human hematopoietic progenitors with FateID.

For this analysis, single cell RNA-seq data generated by Smart-seq2 for individual 1 from Velten et al.1 were used. (a) t-SNE map showing clusters of cells with similar transcriptomes derived by RaceID3. (b) Heatmap of log2-transformed averaged normalized expression of known marker genes across clusters. The cluster number and color are indicated on the right. Only clusters with >3 cells were included. A hierarchical clustering dendogram is shown on the right margin. (c) Correlation of predicted fate bias and cell surface marker expression measured by index-sorting for FateID (left) and STEMNET (right). Surface expression of CD135 (encoded by FLT3) and CD45RA discriminates progenitors of neutrophils, monocytes, pDCs, and lymphocytes, on the one hand, and eosinophils/basophils/mast cells, erythrocytes and megakaryocytes, on the other hand. Surface expression of the two markers is positively correlated to fate bias towards the former group of lineages and inversely correlated to fate bias towards the latter group of lineages. FateID predictions show a more pronounced difference between the two groups. (d) Shown are t-SNE maps highlighting log-transformed fluorescence intensity measured by flow cytometry (index-sorting) for CD135 (top) and CD35RA (bottom). (e) The fate bias predicted by FateID, corresponding to the probability of a cell to be assigned to a given lineage, is color-coded in the t-SNE map. The fate bias is shown for the B cell lineage (left) and the pDC lineage (right). The black circle marks a population of cells with enhanced fate bias towards pDCs or B cells (f) Shown are t-SNE maps highlighting log-transformed normalized transcript expression of the B cell lineage marker VPREB1 (left) and the pDC lineage marker IRF8 (right). The expression domains of these markers overlap with the predicted domain of fate bias towards the respective lineage and lymphoid progenitors with enhanced bias towards the pDC lineage exhibit co-expression of the two markers (black circle). (g) The fate bias predicted by STEMNET shown for the B cell lineage (left) and the pDC lineage (right). Black circle: see (f). (h) Scatterplots comparing fate bias predicted by FateID and STEMNET for the pDC lineage (left) and the B cell lineage (right). Although the predictions are overall correlated (Spearman’s correlation coefficient is 0.74 for the B cell lineage and 0.67 for the pDC lineage), STEMNET predicts more uniform level across a larger fraction of progenitors. All panels show data from 1,035 cells sequenced from n = 1 donor.

1. Velten, L. et al. Human haematopoietic stem cell lineage commitment is a continuous process. Nat. Cell Biol. 19, 271–281 (2017).

Supplementary Figure 13 Sorting strategy for in vitro differentiation of B cell and pDC progenitors.

(a) Sorting strategy for B cell versus plasmacytoid dendritic cell biased progenitors and staining of cultured progenitors. Cells were gated using SSC-A versus FSC-A (not shown). Only single cells were considered further on using SSC-W versus SSC-H gating followed by FSC-W versus FSC-H gating (not shown). Thereupon only lineage negative (Lin) cells were considered by exclusion of cells positive for the lineage markers TER-119, B220, Cd11b, Gr-1, SiglecH, Cd19 and Cd3ɛ (FITC). The next gating included only Flt3 positive (Flt3+) cells. The Flt3+ cells were plotted using Kit (BV510) versus Sca-1 (BV650) and cells with intermediate expression of Kit and Sca-1 were considered (common lymphoid progenitors, CLP). Finally CLP cells were plotted using IL-7R (BV421) and CD34 (PE). For the culture experiments and for single cell sequencing stringent gates were used to sort Cd34-IL7r+, Cd34+Il7r+ and Cd34+Il7r- cells. In addition, lymphoid-primed multipotent progenitors (LMPP) were sorted for culturing and for single cell sequencing. Fluorescence minus one controls were used for CD34 (PE) to set the sorting gates appropriately (not shown). The threshhold for Il7r-positive cells was set according to the unstained control (not shown). (b) Exemplary flow cytometry plots of progenitors from one mouse after 7 days of culturing in either B cell medium or plasmacytoid dendritic cell (pDC) medium. The surface marker expression of Siglech and Cd19 was assessed to check for lineage commitment towards the pDC or B cell lineage. Independent experiments were performed for n = 5 animals. Data for flow cytometry plot (a) is from one experiment with n = 3 mice, whereas data for flow cytometry plot (b) is derived from one experiment with n = 1 mouse.

Supplementary Figure 14 Revised model of hematopoietic differentiation.

(a) t-SNE map with a principal curve fitted to all cells within the progenitor clusters (1,2,3,7). (b) Heatmap of z-score transformed pseudo-temporal expression profiles of a number of multipotency and lineage marker genes. Cells were ordered along the principal curve in (a) and profiles were smoothened by a local regression. The bottom panel depicts a local regression of the fate bias using the same temporal ordering. Successive expression of multipotency markers Kit, Ly6a, Ifitm1, Cd34, Cd48, and Flt3 is consistent with the ordering of multipotent progenitor (MPP) stages MPP1 to MPP4 as previously inferred by bulk measurements1,2. Lineage markers comprise Gata2, Car2, Gata1, Pf4 for the erythrocyte/megakaryocyte lineage, Cebpa, Csf1r, Mpo for the granulocyte and monocyte lineage, Itgax for the conventional dendritic cell lineage, and Irf8, Tcf4 for the plasmacytoid dendritic cell lineage, and Il7r, Rag1, Ebf1, Dntt for the B cell lineage. (c) Pictorial representation of the derived lineage tree. In (a) and (c) data are shown for 1,802 cells from 3 independent experiments with n = 4 mice. In (b) profiles are shown for 1,416 cells from three independent experiments with n = 4 mice. (d) Hematopoietic lineage tree inferred by StemID2. Only significant links are shown (P<0.01). The color of the link indicates the -log10p-value. The color of the vertices indicates the entropy. The thickness indicates the link score reflecting how densely a link is covered with cells. The lineage tree is consistent with independently derived fate bias estimates: erythrocytes (cluster 9) branch off from cluster 1, while cluster 2 and 3 give rise to granulocytes/monocytes (cluster 17), and cluster 2 and 7 comprise progenitors of pDCs (cluster 12) and B cells (cluster 10). (e) Barplot of StemID2 scores for hematopoietic clusters. Cluster 1, which shows highest expression of HSC markers, such as Ifitm12, receives the highest StemID2 score. In (a) and (b), only clusters with >5 cells were included and a link score cut-off of 0.5 was applied. The StemID2 in (d) and (e) computation has been performed on 1,802 cells from three independent experiments with n = 4 mice.

1. Wilson, A. et al. Hematopoietic Stem Cells Reversibly Switch from Dormancy to Self-Renewal during Homeostasis and Repair. Cell 135, 1118–1129 (2008).

2. Cabezas-Wallscheid, N. et al. Identification of Regulatory Networks in HSCs and Their Immediate Progeny via Integrated Proteome, Transcriptome, and DNA Methylome Analysis. Cell Stem Cell 15, 507–522 (2014).

Supplementary Figure 15 FateID recovers bipotent progenitor of monocytes and neutrophils.

(a) t-SNE map of the clusters identified by RaceID3 analysis on the murine hematopoietic transcriptome data from Olsson et al.1. The cell types described in the original study were recovered and are indicated next to the clusters. (b) t-SNE maps highlighting normalized expression of lineage-specific markers. (c) FateID fate bias predictions for all lineages. (d) Scatterplot for the comparison of fate bias towards the monocyte and the neutrophil lineage. Aggregated marker gene expression is highlighted for monocytes (left; Irf8, Csf1r, Ly86), for neutrophils (middle; Gfi1, Cebpe, S100a8) and the bi-potential progenitor (right; Ctsg, Elane, Mpo). The markers are taken from Figure 1c of Olsson et al.1. The FateID analysis reveals that lineage-specific marker gene expression coincides with uni-lineage bias, while progenitors with similar fate biases for monocytes and neutrophils express markers, which are expressed in both of the lineages as well as low levels of the lineage-specific markers. This is consistent with the interpretation of Olsson et al., that these cells represent bi-potent progenitors. We note that the bi-potency of this transitional state was validated by in vitro differentiation assays in the original study. All panels show data of 382 cells sequenced from n = 3 mice.

1. Olsson, A. et al. Single-cell analysis of mixed-lineage states leading to a binary cell fate choice. Nature 537, 698–702 (2016).

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–15 and Supplementary Notes 1–5

Life Sciences Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Herman, J., Sagar & Grün, D. FateID infers cell fate bias in multipotent progenitors from single-cell RNA-seq data. Nat Methods 15, 379–386 (2018). https://doi.org/10.1038/nmeth.4662

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.4662

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research