Abstract
Differentiation alters molecular properties of stem and progenitor cells, leading to changes in their shape and movement characteristics. We present a deep neural network that prospectively predicts lineage choice in differentiating primary hematopoietic progenitors using image patches from brightfield microscopy and cellular movement. Surprisingly, lineage choice can be detected up to three generations before conventional molecular markers are observable. Our approach allows identification of cells with differentially expressed lineage-specifying genes without molecular labeling.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Predicting multipotency of human adult stem cells derived from various donors through deep learning
Scientific Reports Open Access 14 December 2022
-
Current progress and open challenges for applying deep learning across the biosciences
Nature Communications Open Access 01 April 2022
-
Deep learning-based predictive identification of neural stem cell differentiation
Nature Communications Open Access 10 May 2021
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 per month
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout


References
Skylaki, S., Hilsenbeck, O. & Schroeder, T. Nat. Biotechnol. 34, 1137–1144 (2016).
Schroeder, T. Nat. Methods 8 (Suppl.), S30–S35 (2011).
Rieger, M.A. & Schroeder, T. Cells Tissues Organs 188, 139–149 (2008).
Filipczyk, A. et al. Nat. Cell Biol. 17, 1235–1246 (2015).
Rieger, M.A., Hoppe, P.S., Smejkal, B.M., Eitelhuber, A.C. & Schroeder, T. Science 325, 217–218 (2009).
Hoppe, P.S. et al. Nature 535, 299–302 (2016).
Bengio, Y., Simard, P. & Frasconi, P. IEEE Trans. Neural Netw. 5, 157–166 (1994).
Graves, A. & Schmidhuber, J. Neural Netw. 18, 602–610 (2005).
Graves, A., Jaitly, N. & Mohamed, A.-r. in 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) 273–278 (IEEE, 2013).
Sandberg, R. Nat. Methods 11, 22–24 (2014).
Hoppe, P.S., Coutu, D.L. & Schroeder, T. Nat. Cell Biol. 16, 919–927 (2014).
Veta, M. et al. Med. Image Anal. 20, 237–248 (2015).
Liu, A.-A., Li, K. & Kanade, T. IEEE Trans. Med. Imaging 31, 359–369 (2012).
Huh, S., Ker, D.F.E., Bise, R., Chen, M. & Kanade, T. IEEE Trans. Med. Imaging 30, 586–596 (2011).
Cohen, A.R., Gomes, F.L.A.F., Roysam, B. & Cayouette, M. Nat. Methods 7, 213–218 (2010).
Liu, A.-A., Li, K. & Kanade, T. in 2010 IEEE International Symposium on Biomedical Imaging: From Nano to Macro 580–583 (IEEE, 2010).
Breiman, L. Mach. Learn. 45, 5–32 (2001).
Winter, M.R. et al. Stem Cell Rep. 5, 609–620 (2015).
Dykstra, B. et al. Proc. Natl. Acad. Sci. USA 103, 8185–8190 (2006).
Lutolf, M.P., Doyonnas, R., Havenstrite, K., Koleckar, K. & Blau, H.M. Integr. Biol. 1, 59–69 (2009).
Osawa, M., Hanada, K.-I., Hamada, H. & Nakauchi, H. Science 273, 242–245 (1996).
Kiel, M.J. et al. Cell 121, 1109–1121 (2005).
Selinummi, J. et al. PLoS One 4, e7497 (2009).
Hilsenbeck, O. et al. Nat. Biotechnol. 34, 703–706 (2016).
Buggenthin, F. et al. BMC Bioinformatics 14, 297 (2013).
Lecun, Y., Bottou, L., Bengio, Y. & Haffner, P. Proc. IEEE 86, 2278–2324 (1998).
Nair, V. & Hinton, G.E. in Proceedings of the 27th International Conference on Machine Learning (ICML-10) 807–814 (ICML, 2010).
Ranzato, M., Huang, F.J., Boureau, Y.-L. & LeCun, Y. in IEEE Conference on Computer Vision and Pattern Recognition, 2007 (CVPR '07) 1–8 (IEEE, 2007).
Ciresan, D.C., Meier, U., Masci, J., Gambardella, L. M. & Schmidhuber, J. in IJCAI Proceedings–International Joint Conference on Artificial Intelligence 22, 1237 (2011).
Ioffe, S. & Szegedy, C. Batch normalization: accelerating deep network training by reducing internal covariate shift. Preprint at https://arxiv.org/abs/1502.03167 (2015).
Glorot, X. & Bengio, Y. in International Conference on Artificial Intelligence and Statistics, 249–256 (2010).
Jia, Y. et al. in Proceedings of the 22nd ACM International Conference on Multimedia 675–678 (ACM, 2014).
Braun, H. & Riedmiller, M. in Proceedings of the International Symposium on Computer and Information Science VII (1992).
Zernike, F. Physica 1, 689–704 (1934).
Smith, K., Carleton, A. & Lepetit, V. in Proceedings of the International Conference on Computer Vision (ICCV) (2009).
Haralick, R.M. IEEE Trans. Syst. Man Cybern. SMC-3, 610–621 (1973).
Gabor, D. J. Instrum. 93, 429–441 (1946).
Tamura, H., Mori, S. & Yamawaki, T. IEEE Trans. Syst. Man Cybern. 8, 460–473 (1978).
Dalal, N. & Triggs, W. in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR05) 1, 886–893 (IEEE, 2004).
Djuric, N., Lan, L., Vucetic, S. & Wang, Z. J. Mach. Learn. Res. 14, 3813–3817 (2013).
Vedaldi, A. & Fulkerson, B. VLFeat: An Open and Portable Library of Computer Vision Algorithms. (2008).
Müller, A.C. & Behnke, S. J. Mach. Learn. Res. 15, 2055–2060 (2014).
Junior, O.L., Delgado, D., Goncalves, V., Nunes, U. & Ludwig, O. in 2009 12th International IEEE Conference on Intelligent Transportation Systems 1–6 (IEEE, 2009).
The Theano Development Team. et al. Theano: a Python framework for fast computation of mathematical expressions. Preprint available at https://arxiv.org/abs/1605.02688 (2016).
Acknowledgements
We thank S. Pölsterl for comments on the manuscript. This work was supported by the German Federal Ministry of Education and Research (BMBF), the European Research Council starting grant (Latent Causes grant 259294, FJT), the BioSysNet (Bavarian Research Network for Molecular Biosystems, FJT), the German Research Foundation (DFG) within the SPPs 1395 and 1356 to FJT, the UK Medical Research Council (Career Development Fellowship MR/M01536X/1 to FBT), and the Swiss National Science Foundation grant 31003A_156431 and SystemsX IPhD grant 2014/244 to TS.
Author information
Authors and Affiliations
Contributions
F. Buggentin developed the image processing and machine learning pipeline, tracked cells and analyzed the data. F. Buettner developed the deep neural network approach with M.K., M. Strasser and M. Schwarzfischer contributed to image processing and data analysis. P.S.H. developed and conducted all experiments and tracked cells with M.E.; T.S. developed and supervised data generation. D.L., K.D.K., and O.H. contributed to data generation and analysis. F.J.T. and T.S. initiated the study. C.M. supervised the study with F.J.T., and wrote the manuscript with F.B.U. and F.B.T. All authors commented on the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 Experimental setup.
Primary murine bone marrow cells are extracted from the bone marrow of adult mice. Hematopoietic stem and progenitor cells (HSPCs) are purified by fluorescence activated cell sorting (FACS) and cultivated on plastic with added CD16/32 antibody. Cells are imaged in brightfield and three fluorescence channels for up to 8 days. After the experiment is stopped, single HSPCs and their progeny are manually tracked and annotated.
Supplementary Figure 2 Lineage annotation and dataset overview.
A single HSPC at the root of a genealogy gives rise to an exponentially growing number of successors via repeated cell division. Cells committed to the GM (a) or MegE (b) lineage are called annotated and shown with dark blue and dark red lines, respectively. Cells with no annotated marker onset but annotated sister, daughter, grand-daughter etc. cells are called latent and shown with light blue and light red lines, respectively. (c) Genealogy of cells with no annotated marker onset. (d) All genealogies that were used in this study. (e) Amount of cells in every experiment, subdivided into annotated, latent, and unknown cells. (f) Table of the number of used genealogies, single cells and image patches.
Supplementary Figure 3 Deep neural network performance.
(a) Receiver operating characteristic curves and area under the curve (AUC, 1.0 = perfect classification, 0.5 = random assignment) for each round, evaluated separately for annotated and latent cells. (b) F1 scores for cells with annotated marker onset (generations 0,1,2) and latent cells (generations -5 to -1).
Supplementary Figure 4 Prospective identification of lineage choice in non-genetically modified mice.
(a) Receiver operating characteristic curves and area under the curve (AUC, 1.0 = perfect classification, 0.5 = random assignment) for each round and for non-genetically modified mice (‘Wild type’), evaluated separately for annotated and latent cells. (b) AUCs determine the performance of the trained models. For non-genetically modified cells (‘Wild type’), annotated cells (generations 0,1,2) and latent cells up to 2 generations before a marker onset (generations -2,-1) show AUCs higher than 0.78. AUC drops 4 generations before marker onset. (c) F1 scores for non-genetically modified cells (‘Wild type’) with annotated marker onset (generation 0,1,2) and latent cells (-5 to -1).
Supplementary Figure 5 Unknown and latent cells predicted to be either GM or MegE committed express significantly different PU.1-eYFP levels.
Same analysis as described in Figure 2, but for (a) train-test round 2 and (b) train-test round 3. (c) Amount of unknown or latent (white boxes) versus annotated cells (blue and red boxes), summed over all experiments. 2±1% (mean±sd, n=3 rounds) of GM and 15±8% of MegE marker onsets were annotated earlier than four generations after experiment start. Our approach is able to prospectively detect lineage commitment from generation 3 onwards (black arrow), outperforming the conventional manual marker-based approach.
Supplementary Figure 6 Cell cycle averaged patch lineage score.
To improve classification performance of the CNN, we average all patch lineage scores L over the cell cycle, resulting in a cell lineage score <L>. The prediction profile is shown exemplarily for the cell at generation -1 in Fig. 1b.
Supplementary Figure 7 RNN-CNN outperforms feature-based methods with respect to calibration.
(a) Area under the curve (AUC, mean±sd, n=3 rounds) for generations before and after an annotated marker onset for our RNN-CNN (red), two CNNs (AlexNet in blue, LeNet-5 in green), a random forest (yellow), a support vector machine (SVM, gray), and a conditional random field (CRF, white). Our method is on par with the CNN-only implementations and the random forest based classification. It outperforms the SVM and the CRF on annotated cells and most latent generations. (b) F1 scores (averaged lineage score threshold: 0.5, mean±sd, n=3 rounds) for generations before and after an annotated marker onset for the methods in a and the algorithmic information theoretic prediction (AITP, brown). Due to better calibration, our RNN-CNN shows highest scores and lowest variance in most generations.
Supplementary Figure 8 Importance of morphological features for classification performance.
Gini importance for all 87 morphology features and cellular displacement as reported by the random forest model (mean±sd, n=3 rounds). Cell displacement and simple morphological features (minimum and maximum intensity, perimeter, major axis length) have the most impact on classification performance. More sophisticated shape (Zernike moments, ray features) or texture features (Tamura and Haralick features) are less relevant.
Supplementary Figure 9 Feature differences for predicted cells.
(a) Displacement against generation after experiment start, for cells predicted to become GM and MegE, respectively. GM predicted cells are slightly shifted towards higher displacement for the three train-test rounds. (b) Major axis length of an ellipsoidal fit to the 2D cell body against generation after experiment start, for cells predicted to become GM and MegE, respectively. GM predicted cells are slightly shifted towards higher axis lengths for the three train-test rounds.
Supplementary Figure 10 Quality and robustness of cell identification.
(a) Cells with a size error of 10%-20% (see Supplementary Note 1) show very few outliers and smooth size quantifications (cell taken from the median, solid black line in d). (b) Cells with an error higher than 25% have a few measurement errors (most likely due to under-segmentation), yet the B-spline fit is smooth and the majority of residues is small (cell at 95th percentile, dashed black line in d). (c) If cell identification is erroneous, a high variance in cell size quantification results in a size error of more than 45% (cell at 99th percentile, dotted black line in d), which is unlikely for normal cell growth. (d) Fraction of all cells per error in cell size (root mean square error of B-spline fit divided by mean cell size), for each experiment. For less than 5% of all cells in our dataset, the error in cell size was higher than 25% (dashed black line).
Supplementary Figure 11 GUI for correction of marker onset annotations.
(a) Genealogy overview. Annotations can be inspected in every branch (black: unknown, blue: GM, red: MegE) (b) Annotation view. Dropdown menus allow to select different experiments, genealogies and branches from the dataset. After selection, the quantifications for every cell in the branch are loaded. Cell size (first panel) is used to check cell identification quality and to calculate fluorescence concentrations. PU.1-eYFP concentration (second panel) is not used for annotation, whereas GATA1-mCherry (third panel) and CD16/32 antibody concentration (fourth panel) can be used to change the annotation timepoint of a full branch by clicking on the respective frame (red line). The identified single cell in brightfield (upper right image) and its fluorescence signal for the changed channel (lower right image) are shown for visual quality control. Note that high levels of GATA1-mCherry signal at movie start stem from the antibody staining used for HSPC purification by FACS.
Supplementary Figure 12 Image patch gallery of differentiated cells.
100 distinct cells with expressed markers for the GM (a) and MegE (b) lineage, respectively.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–12 and Supplementary Note 1 (PDF 2256 kb)
Supplementary Software 1
CNN-RNN based prediction (ZIP 2442 kb)
Supplementary Video 1
Exemplary manual tracking of dividing HSPCs over 26 hours (MP4 23688 kb)
Supplementary Video 2
Exemplary automatic segmentation of dividing HSPCs over 3 days (WMV 14894 kb)
Source data
Rights and permissions
About this article
Cite this article
Buggenthin, F., Buettner, F., Hoppe, P. et al. Prospective identification of hematopoietic lineage choice by deep learning. Nat Methods 14, 403–406 (2017). https://doi.org/10.1038/nmeth.4182
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nmeth.4182
This article is cited by
-
Predicting multipotency of human adult stem cells derived from various donors through deep learning
Scientific Reports (2022)
-
Hierarchical deep reinforcement learning reveals a modular mechanism of cell movement
Nature Machine Intelligence (2022)
-
Current progress and open challenges for applying deep learning across the biosciences
Nature Communications (2022)
-
Learning biophysical determinants of cell fate with deep neural networks
Nature Machine Intelligence (2022)
-
InstantDL: an easy-to-use deep learning pipeline for image segmentation and classification
BMC Bioinformatics (2021)