Prospective identification of hematopoietic lineage choice by deep learning


Differentiation alters molecular properties of stem and progenitor cells, leading to changes in their shape and movement characteristics. We present a deep neural network that prospectively predicts lineage choice in differentiating primary hematopoietic progenitors using image patches from brightfield microscopy and cellular movement. Surprisingly, lineage choice can be detected up to three generations before conventional molecular markers are observable. Our approach allows identification of cells with differentially expressed lineage-specifying genes without molecular labeling.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Prediction of hematopoietic lineage choice up to three generations before molecular marker annotation using deep neural networks.
Figure 2: Subsets of cells with differential PU.1–eYFP expression can be distinguished two generations after experiment start.


  1. 1

    Skylaki, S., Hilsenbeck, O. & Schroeder, T. Nat. Biotechnol. 34, 1137–1144 (2016).

    CAS  Article  Google Scholar 

  2. 2

    Schroeder, T. Nat. Methods 8 (Suppl.), S30–S35 (2011).

    CAS  Article  Google Scholar 

  3. 3

    Rieger, M.A. & Schroeder, T. Cells Tissues Organs 188, 139–149 (2008).

    Article  Google Scholar 

  4. 4

    Filipczyk, A. et al. Nat. Cell Biol. 17, 1235–1246 (2015).

    CAS  Article  Google Scholar 

  5. 5

    Rieger, M.A., Hoppe, P.S., Smejkal, B.M., Eitelhuber, A.C. & Schroeder, T. Science 325, 217–218 (2009).

    CAS  Article  Google Scholar 

  6. 6

    Hoppe, P.S. et al. Nature 535, 299–302 (2016).

    CAS  Article  Google Scholar 

  7. 7

    Bengio, Y., Simard, P. & Frasconi, P. IEEE Trans. Neural Netw. 5, 157–166 (1994).

    CAS  Article  Google Scholar 

  8. 8

    Graves, A. & Schmidhuber, J. Neural Netw. 18, 602–610 (2005).

    Article  Google Scholar 

  9. 9

    Graves, A., Jaitly, N. & Mohamed, A.-r. in 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) 273–278 (IEEE, 2013).

  10. 10

    Sandberg, R. Nat. Methods 11, 22–24 (2014).

    CAS  Article  Google Scholar 

  11. 11

    Hoppe, P.S., Coutu, D.L. & Schroeder, T. Nat. Cell Biol. 16, 919–927 (2014).

    CAS  Article  Google Scholar 

  12. 12

    Veta, M. et al. Med. Image Anal. 20, 237–248 (2015).

    Article  Google Scholar 

  13. 13

    Liu, A.-A., Li, K. & Kanade, T. IEEE Trans. Med. Imaging 31, 359–369 (2012).

    Article  Google Scholar 

  14. 14

    Huh, S., Ker, D.F.E., Bise, R., Chen, M. & Kanade, T. IEEE Trans. Med. Imaging 30, 586–596 (2011).

    Article  Google Scholar 

  15. 15

    Cohen, A.R., Gomes, F.L.A.F., Roysam, B. & Cayouette, M. Nat. Methods 7, 213–218 (2010).

    CAS  Article  Google Scholar 

  16. 16

    Liu, A.-A., Li, K. & Kanade, T. in 2010 IEEE International Symposium on Biomedical Imaging: From Nano to Macro 580–583 (IEEE, 2010).

  17. 17

    Breiman, L. Mach. Learn. 45, 5–32 (2001).

    Article  Google Scholar 

  18. 18

    Winter, M.R. et al. Stem Cell Rep. 5, 609–620 (2015).

    Article  Google Scholar 

  19. 19

    Dykstra, B. et al. Proc. Natl. Acad. Sci. USA 103, 8185–8190 (2006).

    CAS  Article  Google Scholar 

  20. 20

    Lutolf, M.P., Doyonnas, R., Havenstrite, K., Koleckar, K. & Blau, H.M. Integr. Biol. 1, 59–69 (2009).

    CAS  Article  Google Scholar 

  21. 21

    Osawa, M., Hanada, K.-I., Hamada, H. & Nakauchi, H. Science 273, 242–245 (1996).

    CAS  Article  Google Scholar 

  22. 22

    Kiel, M.J. et al. Cell 121, 1109–1121 (2005).

    CAS  Article  Google Scholar 

  23. 23

    Selinummi, J. et al. PLoS One 4, e7497 (2009).

    Article  Google Scholar 

  24. 24

    Hilsenbeck, O. et al. Nat. Biotechnol. 34, 703–706 (2016).

    CAS  Article  Google Scholar 

  25. 25

    Buggenthin, F. et al. BMC Bioinformatics 14, 297 (2013).

    Article  Google Scholar 

  26. 26

    Lecun, Y., Bottou, L., Bengio, Y. & Haffner, P. Proc. IEEE 86, 2278–2324 (1998).

    Article  Google Scholar 

  27. 27

    Nair, V. & Hinton, G.E. in Proceedings of the 27th International Conference on Machine Learning (ICML-10) 807–814 (ICML, 2010).

  28. 28

    Ranzato, M., Huang, F.J., Boureau, Y.-L. & LeCun, Y. in IEEE Conference on Computer Vision and Pattern Recognition, 2007 (CVPR '07) 1–8 (IEEE, 2007).

  29. 29

    Ciresan, D.C., Meier, U., Masci, J., Gambardella, L. M. & Schmidhuber, J. in IJCAI Proceedings–International Joint Conference on Artificial Intelligence 22, 1237 (2011).

  30. 30

    Ioffe, S. & Szegedy, C. Batch normalization: accelerating deep network training by reducing internal covariate shift. Preprint at (2015).

  31. 31

    Glorot, X. & Bengio, Y. in International Conference on Artificial Intelligence and Statistics, 249–256 (2010).

  32. 32

    Jia, Y. et al. in Proceedings of the 22nd ACM International Conference on Multimedia 675–678 (ACM, 2014).

  33. 33

    Braun, H. & Riedmiller, M. in Proceedings of the International Symposium on Computer and Information Science VII (1992).

  34. 34

    Zernike, F. Physica 1, 689–704 (1934).

    Article  Google Scholar 

  35. 35

    Smith, K., Carleton, A. & Lepetit, V. in Proceedings of the International Conference on Computer Vision (ICCV) (2009).

  36. 36

    Haralick, R.M. IEEE Trans. Syst. Man Cybern. SMC-3, 610–621 (1973).

    Article  Google Scholar 

  37. 37

    Gabor, D. J. Instrum. 93, 429–441 (1946).

    Google Scholar 

  38. 38

    Tamura, H., Mori, S. & Yamawaki, T. IEEE Trans. Syst. Man Cybern. 8, 460–473 (1978).

    Article  Google Scholar 

  39. 39

    Dalal, N. & Triggs, W. in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR05) 1, 886–893 (IEEE, 2004).

    Article  Google Scholar 

  40. 40

    Djuric, N., Lan, L., Vucetic, S. & Wang, Z. J. Mach. Learn. Res. 14, 3813–3817 (2013).

    Google Scholar 

  41. 41

    Vedaldi, A. & Fulkerson, B. VLFeat: An Open and Portable Library of Computer Vision Algorithms. (2008).

  42. 42

    Müller, A.C. & Behnke, S. J. Mach. Learn. Res. 15, 2055–2060 (2014).

    Google Scholar 

  43. 43

    Junior, O.L., Delgado, D., Goncalves, V., Nunes, U. & Ludwig, O. in 2009 12th International IEEE Conference on Intelligent Transportation Systems 1–6 (IEEE, 2009).

  44. 44

    The Theano Development Team. et al. Theano: a Python framework for fast computation of mathematical expressions. Preprint available at (2016).

Download references


We thank S. Pölsterl for comments on the manuscript. This work was supported by the German Federal Ministry of Education and Research (BMBF), the European Research Council starting grant (Latent Causes grant 259294, FJT), the BioSysNet (Bavarian Research Network for Molecular Biosystems, FJT), the German Research Foundation (DFG) within the SPPs 1395 and 1356 to FJT, the UK Medical Research Council (Career Development Fellowship MR/M01536X/1 to FBT), and the Swiss National Science Foundation grant 31003A_156431 and SystemsX IPhD grant 2014/244 to TS.

Author information




F. Buggentin developed the image processing and machine learning pipeline, tracked cells and analyzed the data. F. Buettner developed the deep neural network approach with M.K., M. Strasser and M. Schwarzfischer contributed to image processing and data analysis. P.S.H. developed and conducted all experiments and tracked cells with M.E.; T.S. developed and supervised data generation. D.L., K.D.K., and O.H. contributed to data generation and analysis. F.J.T. and T.S. initiated the study. C.M. supervised the study with F.J.T., and wrote the manuscript with F.B.U. and F.B.T. All authors commented on the manuscript.

Corresponding authors

Correspondence to Timm Schroeder or Fabian J Theis or Carsten Marr.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Experimental setup.

Primary murine bone marrow cells are extracted from the bone marrow of adult mice. Hematopoietic stem and progenitor cells (HSPCs) are purified by fluorescence activated cell sorting (FACS) and cultivated on plastic with added CD16/32 antibody. Cells are imaged in brightfield and three fluorescence channels for up to 8 days. After the experiment is stopped, single HSPCs and their progeny are manually tracked and annotated.

Supplementary Figure 2 Lineage annotation and dataset overview.

A single HSPC at the root of a genealogy gives rise to an exponentially growing number of successors via repeated cell division. Cells committed to the GM (a) or MegE (b) lineage are called annotated and shown with dark blue and dark red lines, respectively. Cells with no annotated marker onset but annotated sister, daughter, grand-daughter etc. cells are called latent and shown with light blue and light red lines, respectively. (c) Genealogy of cells with no annotated marker onset. (d) All genealogies that were used in this study. (e) Amount of cells in every experiment, subdivided into annotated, latent, and unknown cells. (f) Table of the number of used genealogies, single cells and image patches.

Supplementary Figure 3 Deep neural network performance.

(a) Receiver operating characteristic curves and area under the curve (AUC, 1.0 = perfect classification, 0.5 = random assignment) for each round, evaluated separately for annotated and latent cells. (b) F1 scores for cells with annotated marker onset (generations 0,1,2) and latent cells (generations -5 to -1).

Supplementary Figure 4 Prospective identification of lineage choice in non-genetically modified mice.

(a) Receiver operating characteristic curves and area under the curve (AUC, 1.0 = perfect classification, 0.5 = random assignment) for each round and for non-genetically modified mice (‘Wild type’), evaluated separately for annotated and latent cells. (b) AUCs determine the performance of the trained models. For non-genetically modified cells (‘Wild type’), annotated cells (generations 0,1,2) and latent cells up to 2 generations before a marker onset (generations -2,-1) show AUCs higher than 0.78. AUC drops 4 generations before marker onset. (c) F1 scores for non-genetically modified cells (‘Wild type’) with annotated marker onset (generation 0,1,2) and latent cells (-5 to -1).

Supplementary Figure 5 Unknown and latent cells predicted to be either GM or MegE committed express significantly different PU.1-eYFP levels.

Same analysis as described in Figure 2, but for (a) train-test round 2 and (b) train-test round 3. (c) Amount of unknown or latent (white boxes) versus annotated cells (blue and red boxes), summed over all experiments. 2±1% (mean±sd, n=3 rounds) of GM and 15±8% of MegE marker onsets were annotated earlier than four generations after experiment start. Our approach is able to prospectively detect lineage commitment from generation 3 onwards (black arrow), outperforming the conventional manual marker-based approach.

Supplementary Figure 6 Cell cycle averaged patch lineage score.

To improve classification performance of the CNN, we average all patch lineage scores L over the cell cycle, resulting in a cell lineage score <L>. The prediction profile is shown exemplarily for the cell at generation -1 in Fig. 1b.

Supplementary Figure 7 RNN-CNN outperforms feature-based methods with respect to calibration.

(a) Area under the curve (AUC, mean±sd, n=3 rounds) for generations before and after an annotated marker onset for our RNN-CNN (red), two CNNs (AlexNet in blue, LeNet-5 in green), a random forest (yellow), a support vector machine (SVM, gray), and a conditional random field (CRF, white). Our method is on par with the CNN-only implementations and the random forest based classification. It outperforms the SVM and the CRF on annotated cells and most latent generations. (b) F1 scores (averaged lineage score threshold: 0.5, mean±sd, n=3 rounds) for generations before and after an annotated marker onset for the methods in a and the algorithmic information theoretic prediction (AITP, brown). Due to better calibration, our RNN-CNN shows highest scores and lowest variance in most generations.

Supplementary Figure 8 Importance of morphological features for classification performance.

Gini importance for all 87 morphology features and cellular displacement as reported by the random forest model (mean±sd, n=3 rounds). Cell displacement and simple morphological features (minimum and maximum intensity, perimeter, major axis length) have the most impact on classification performance. More sophisticated shape (Zernike moments, ray features) or texture features (Tamura and Haralick features) are less relevant.

Supplementary Figure 9 Feature differences for predicted cells.

(a) Displacement against generation after experiment start, for cells predicted to become GM and MegE, respectively. GM predicted cells are slightly shifted towards higher displacement for the three train-test rounds. (b) Major axis length of an ellipsoidal fit to the 2D cell body against generation after experiment start, for cells predicted to become GM and MegE, respectively. GM predicted cells are slightly shifted towards higher axis lengths for the three train-test rounds.

Supplementary Figure 10 Quality and robustness of cell identification.

(a) Cells with a size error of 10%-20% (see Supplementary Note 1) show very few outliers and smooth size quantifications (cell taken from the median, solid black line in d). (b) Cells with an error higher than 25% have a few measurement errors (most likely due to under-segmentation), yet the B-spline fit is smooth and the majority of residues is small (cell at 95th percentile, dashed black line in d). (c) If cell identification is erroneous, a high variance in cell size quantification results in a size error of more than 45% (cell at 99th percentile, dotted black line in d), which is unlikely for normal cell growth. (d) Fraction of all cells per error in cell size (root mean square error of B-spline fit divided by mean cell size), for each experiment. For less than 5% of all cells in our dataset, the error in cell size was higher than 25% (dashed black line).

Supplementary Figure 11 GUI for correction of marker onset annotations.

(a) Genealogy overview. Annotations can be inspected in every branch (black: unknown, blue: GM, red: MegE) (b) Annotation view. Dropdown menus allow to select different experiments, genealogies and branches from the dataset. After selection, the quantifications for every cell in the branch are loaded. Cell size (first panel) is used to check cell identification quality and to calculate fluorescence concentrations. PU.1-eYFP concentration (second panel) is not used for annotation, whereas GATA1-mCherry (third panel) and CD16/32 antibody concentration (fourth panel) can be used to change the annotation timepoint of a full branch by clicking on the respective frame (red line). The identified single cell in brightfield (upper right image) and its fluorescence signal for the changed channel (lower right image) are shown for visual quality control. Note that high levels of GATA1-mCherry signal at movie start stem from the antibody staining used for HSPC purification by FACS.

Supplementary Figure 12 Image patch gallery of differentiated cells.

100 distinct cells with expressed markers for the GM (a) and MegE (b) lineage, respectively.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–12 and Supplementary Note 1 (PDF 2256 kb)

Supplementary Software 1

CNN-RNN based prediction (ZIP 2442 kb)


Exemplary manual tracking of dividing HSPCs over 26 hours (MP4 23688 kb)

Supplementary Video 1

Exemplary manual tracking of dividing HSPCs over 26 hours (MP4 23688 kb)


Exemplary automatic segmentation of dividing HSPCs over 3 days (WMV 14894 kb)

Supplementary Video 2

Exemplary automatic segmentation of dividing HSPCs over 3 days (WMV 14894 kb)

Source data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Buggenthin, F., Buettner, F., Hoppe, P. et al. Prospective identification of hematopoietic lineage choice by deep learning. Nat Methods 14, 403–406 (2017).

Download citation

Further reading