Differentiation alters molecular properties of stem and progenitor cells, leading to changes in their shape and movement characteristics. We present a deep neural network that prospectively predicts lineage choice in differentiating primary hematopoietic progenitors using image patches from brightfield microscopy and cellular movement. Surprisingly, lineage choice can be detected up to three generations before conventional molecular markers are observable. Our approach allows identification of cells with differentially expressed lineage-specifying genes without molecular labeling.
Subscribe to Journal
Get full journal access for 1 year
only $9.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Skylaki, S., Hilsenbeck, O. & Schroeder, T. Nat. Biotechnol. 34, 1137–1144 (2016).
Schroeder, T. Nat. Methods 8 (Suppl.), S30–S35 (2011).
Rieger, M.A. & Schroeder, T. Cells Tissues Organs 188, 139–149 (2008).
Filipczyk, A. et al. Nat. Cell Biol. 17, 1235–1246 (2015).
Rieger, M.A., Hoppe, P.S., Smejkal, B.M., Eitelhuber, A.C. & Schroeder, T. Science 325, 217–218 (2009).
Hoppe, P.S. et al. Nature 535, 299–302 (2016).
Bengio, Y., Simard, P. & Frasconi, P. IEEE Trans. Neural Netw. 5, 157–166 (1994).
Graves, A. & Schmidhuber, J. Neural Netw. 18, 602–610 (2005).
Graves, A., Jaitly, N. & Mohamed, A.-r. in 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) 273–278 (IEEE, 2013).
Sandberg, R. Nat. Methods 11, 22–24 (2014).
Hoppe, P.S., Coutu, D.L. & Schroeder, T. Nat. Cell Biol. 16, 919–927 (2014).
Veta, M. et al. Med. Image Anal. 20, 237–248 (2015).
Liu, A.-A., Li, K. & Kanade, T. IEEE Trans. Med. Imaging 31, 359–369 (2012).
Huh, S., Ker, D.F.E., Bise, R., Chen, M. & Kanade, T. IEEE Trans. Med. Imaging 30, 586–596 (2011).
Cohen, A.R., Gomes, F.L.A.F., Roysam, B. & Cayouette, M. Nat. Methods 7, 213–218 (2010).
Liu, A.-A., Li, K. & Kanade, T. in 2010 IEEE International Symposium on Biomedical Imaging: From Nano to Macro 580–583 (IEEE, 2010).
Breiman, L. Mach. Learn. 45, 5–32 (2001).
Winter, M.R. et al. Stem Cell Rep. 5, 609–620 (2015).
Dykstra, B. et al. Proc. Natl. Acad. Sci. USA 103, 8185–8190 (2006).
Lutolf, M.P., Doyonnas, R., Havenstrite, K., Koleckar, K. & Blau, H.M. Integr. Biol. 1, 59–69 (2009).
Osawa, M., Hanada, K.-I., Hamada, H. & Nakauchi, H. Science 273, 242–245 (1996).
Kiel, M.J. et al. Cell 121, 1109–1121 (2005).
Selinummi, J. et al. PLoS One 4, e7497 (2009).
Hilsenbeck, O. et al. Nat. Biotechnol. 34, 703–706 (2016).
Buggenthin, F. et al. BMC Bioinformatics 14, 297 (2013).
Lecun, Y., Bottou, L., Bengio, Y. & Haffner, P. Proc. IEEE 86, 2278–2324 (1998).
Nair, V. & Hinton, G.E. in Proceedings of the 27th International Conference on Machine Learning (ICML-10) 807–814 (ICML, 2010).
Ranzato, M., Huang, F.J., Boureau, Y.-L. & LeCun, Y. in IEEE Conference on Computer Vision and Pattern Recognition, 2007 (CVPR '07) 1–8 (IEEE, 2007).
Ciresan, D.C., Meier, U., Masci, J., Gambardella, L. M. & Schmidhuber, J. in IJCAI Proceedings–International Joint Conference on Artificial Intelligence 22, 1237 (2011).
Ioffe, S. & Szegedy, C. Batch normalization: accelerating deep network training by reducing internal covariate shift. Preprint at https://arxiv.org/abs/1502.03167 (2015).
Glorot, X. & Bengio, Y. in International Conference on Artificial Intelligence and Statistics, 249–256 (2010).
Jia, Y. et al. in Proceedings of the 22nd ACM International Conference on Multimedia 675–678 (ACM, 2014).
Braun, H. & Riedmiller, M. in Proceedings of the International Symposium on Computer and Information Science VII (1992).
Zernike, F. Physica 1, 689–704 (1934).
Smith, K., Carleton, A. & Lepetit, V. in Proceedings of the International Conference on Computer Vision (ICCV) (2009).
Haralick, R.M. IEEE Trans. Syst. Man Cybern. SMC-3, 610–621 (1973).
Gabor, D. J. Instrum. 93, 429–441 (1946).
Tamura, H., Mori, S. & Yamawaki, T. IEEE Trans. Syst. Man Cybern. 8, 460–473 (1978).
Dalal, N. & Triggs, W. in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR05) 1, 886–893 (IEEE, 2004).
Djuric, N., Lan, L., Vucetic, S. & Wang, Z. J. Mach. Learn. Res. 14, 3813–3817 (2013).
Vedaldi, A. & Fulkerson, B. VLFeat: An Open and Portable Library of Computer Vision Algorithms. (2008).
Müller, A.C. & Behnke, S. J. Mach. Learn. Res. 15, 2055–2060 (2014).
Junior, O.L., Delgado, D., Goncalves, V., Nunes, U. & Ludwig, O. in 2009 12th International IEEE Conference on Intelligent Transportation Systems 1–6 (IEEE, 2009).
The Theano Development Team. et al. Theano: a Python framework for fast computation of mathematical expressions. Preprint available at https://arxiv.org/abs/1605.02688 (2016).
We thank S. Pölsterl for comments on the manuscript. This work was supported by the German Federal Ministry of Education and Research (BMBF), the European Research Council starting grant (Latent Causes grant 259294, FJT), the BioSysNet (Bavarian Research Network for Molecular Biosystems, FJT), the German Research Foundation (DFG) within the SPPs 1395 and 1356 to FJT, the UK Medical Research Council (Career Development Fellowship MR/M01536X/1 to FBT), and the Swiss National Science Foundation grant 31003A_156431 and SystemsX IPhD grant 2014/244 to TS.
The authors declare no competing financial interests.
Integrated supplementary information
Primary murine bone marrow cells are extracted from the bone marrow of adult mice. Hematopoietic stem and progenitor cells (HSPCs) are purified by fluorescence activated cell sorting (FACS) and cultivated on plastic with added CD16/32 antibody. Cells are imaged in brightfield and three fluorescence channels for up to 8 days. After the experiment is stopped, single HSPCs and their progeny are manually tracked and annotated.
A single HSPC at the root of a genealogy gives rise to an exponentially growing number of successors via repeated cell division. Cells committed to the GM (a) or MegE (b) lineage are called annotated and shown with dark blue and dark red lines, respectively. Cells with no annotated marker onset but annotated sister, daughter, grand-daughter etc. cells are called latent and shown with light blue and light red lines, respectively. (c) Genealogy of cells with no annotated marker onset. (d) All genealogies that were used in this study. (e) Amount of cells in every experiment, subdivided into annotated, latent, and unknown cells. (f) Table of the number of used genealogies, single cells and image patches.
(a) Receiver operating characteristic curves and area under the curve (AUC, 1.0 = perfect classification, 0.5 = random assignment) for each round, evaluated separately for annotated and latent cells. (b) F1 scores for cells with annotated marker onset (generations 0,1,2) and latent cells (generations -5 to -1).
Supplementary Figure 4 Prospective identification of lineage choice in non-genetically modified mice.
(a) Receiver operating characteristic curves and area under the curve (AUC, 1.0 = perfect classification, 0.5 = random assignment) for each round and for non-genetically modified mice (‘Wild type’), evaluated separately for annotated and latent cells. (b) AUCs determine the performance of the trained models. For non-genetically modified cells (‘Wild type’), annotated cells (generations 0,1,2) and latent cells up to 2 generations before a marker onset (generations -2,-1) show AUCs higher than 0.78. AUC drops 4 generations before marker onset. (c) F1 scores for non-genetically modified cells (‘Wild type’) with annotated marker onset (generation 0,1,2) and latent cells (-5 to -1).
Supplementary Figure 5 Unknown and latent cells predicted to be either GM or MegE committed express significantly different PU.1-eYFP levels.
Same analysis as described in Figure 2, but for (a) train-test round 2 and (b) train-test round 3. (c) Amount of unknown or latent (white boxes) versus annotated cells (blue and red boxes), summed over all experiments. 2±1% (mean±sd, n=3 rounds) of GM and 15±8% of MegE marker onsets were annotated earlier than four generations after experiment start. Our approach is able to prospectively detect lineage commitment from generation 3 onwards (black arrow), outperforming the conventional manual marker-based approach.
To improve classification performance of the CNN, we average all patch lineage scores L over the cell cycle, resulting in a cell lineage score <L>. The prediction profile is shown exemplarily for the cell at generation -1 in Fig. 1b.
(a) Area under the curve (AUC, mean±sd, n=3 rounds) for generations before and after an annotated marker onset for our RNN-CNN (red), two CNNs (AlexNet in blue, LeNet-5 in green), a random forest (yellow), a support vector machine (SVM, gray), and a conditional random field (CRF, white). Our method is on par with the CNN-only implementations and the random forest based classification. It outperforms the SVM and the CRF on annotated cells and most latent generations. (b) F1 scores (averaged lineage score threshold: 0.5, mean±sd, n=3 rounds) for generations before and after an annotated marker onset for the methods in a and the algorithmic information theoretic prediction (AITP, brown). Due to better calibration, our RNN-CNN shows highest scores and lowest variance in most generations.
Gini importance for all 87 morphology features and cellular displacement as reported by the random forest model (mean±sd, n=3 rounds). Cell displacement and simple morphological features (minimum and maximum intensity, perimeter, major axis length) have the most impact on classification performance. More sophisticated shape (Zernike moments, ray features) or texture features (Tamura and Haralick features) are less relevant.
(a) Displacement against generation after experiment start, for cells predicted to become GM and MegE, respectively. GM predicted cells are slightly shifted towards higher displacement for the three train-test rounds. (b) Major axis length of an ellipsoidal fit to the 2D cell body against generation after experiment start, for cells predicted to become GM and MegE, respectively. GM predicted cells are slightly shifted towards higher axis lengths for the three train-test rounds.
(a) Cells with a size error of 10%-20% (see Supplementary Note 1) show very few outliers and smooth size quantifications (cell taken from the median, solid black line in d). (b) Cells with an error higher than 25% have a few measurement errors (most likely due to under-segmentation), yet the B-spline fit is smooth and the majority of residues is small (cell at 95th percentile, dashed black line in d). (c) If cell identification is erroneous, a high variance in cell size quantification results in a size error of more than 45% (cell at 99th percentile, dotted black line in d), which is unlikely for normal cell growth. (d) Fraction of all cells per error in cell size (root mean square error of B-spline fit divided by mean cell size), for each experiment. For less than 5% of all cells in our dataset, the error in cell size was higher than 25% (dashed black line).
(a) Genealogy overview. Annotations can be inspected in every branch (black: unknown, blue: GM, red: MegE) (b) Annotation view. Dropdown menus allow to select different experiments, genealogies and branches from the dataset. After selection, the quantifications for every cell in the branch are loaded. Cell size (first panel) is used to check cell identification quality and to calculate fluorescence concentrations. PU.1-eYFP concentration (second panel) is not used for annotation, whereas GATA1-mCherry (third panel) and CD16/32 antibody concentration (fourth panel) can be used to change the annotation timepoint of a full branch by clicking on the respective frame (red line). The identified single cell in brightfield (upper right image) and its fluorescence signal for the changed channel (lower right image) are shown for visual quality control. Note that high levels of GATA1-mCherry signal at movie start stem from the antibody staining used for HSPC purification by FACS.
100 distinct cells with expressed markers for the GM (a) and MegE (b) lineage, respectively.
Supplementary Figures 1–12 and Supplementary Note 1 (PDF 2256 kb)
CNN-RNN based prediction (ZIP 2442 kb)
Exemplary manual tracking of dividing HSPCs over 26 hours (MP4 23688 kb)
Exemplary automatic segmentation of dividing HSPCs over 3 days (WMV 14894 kb)
About this article
Cite this article
Buggenthin, F., Buettner, F., Hoppe, P. et al. Prospective identification of hematopoietic lineage choice by deep learning. Nat Methods 14, 403–406 (2017). https://doi.org/10.1038/nmeth.4182
BMC Bioinformatics (2021)
Nature Communications (2021)
Nature Communications (2021)
Disease-Relevant Single Cell Photonic Signatures Identify S100β Stem Cells and their Myogenic Progeny in Vascular Lesions
Stem Cell Reviews and Reports (2021)
3D convolutional neural networks-based segmentation to acquire quantitative criteria of the nucleus during mouse embryogenesis
npj Systems Biology and Applications (2020)