Prospective identification of hematopoietic lineage choice by deep learning

Buggenthin, Felix; Buettner, Florian; Hoppe, Philipp S; Endele, Max; Kroiss, Manuel; Strasser, Michael; Schwarzfischer, Michael; Loeffler, Dirk; Kokkaliaris, Konstantinos D; Hilsenbeck, Oliver; Schroeder, Timm; Theis, Fabian J; Marr, Carsten

doi:10.1038/nmeth.4182

Brief Communication
Published: 20 February 2017

Prospective identification of hematopoietic lineage choice by deep learning

Felix Buggenthin¹^na1,
Florian Buettner ORCID: orcid.org/0000-0001-5587-6761^1,2^na1,
Philipp S Hoppe^3,4,
Max Endele³,
Manuel Kroiss^1,5,
Michael Strasser¹,
Michael Schwarzfischer¹,
Dirk Loeffler^3,4,
Konstantinos D Kokkaliaris^3,4,
Oliver Hilsenbeck^3,4,
Timm Schroeder ORCID: orcid.org/0000-0002-9271-8815^3,4,
Fabian J Theis ORCID: orcid.org/0000-0002-2419-1943^1,5 &
…
Carsten Marr ORCID: orcid.org/0000-0003-2154-4552¹

Nature Methods volume 14, pages 403–406 (2017)Cite this article

12k Accesses
127 Citations
135 Altmetric
Metrics details

Subjects

Abstract

Differentiation alters molecular properties of stem and progenitor cells, leading to changes in their shape and movement characteristics. We present a deep neural network that prospectively predicts lineage choice in differentiating primary hematopoietic progenitors using image patches from brightfield microscopy and cellular movement. Surprisingly, lineage choice can be detected up to three generations before conventional molecular markers are observable. Our approach allows identification of cells with differentially expressed lineage-specifying genes without molecular labeling.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Prediction of hematopoietic lineage choice up to three generations before molecular marker annotation using deep neural networks.**

**Figure 2: Subsets of cells with differential PU.1–eYFP expression can be distinguished two generations after experiment start.**

Automated reconstruction of whole-embryo cell lineages by learning from sparse annotations

Article Open access 05 September 2022

Modular deep learning enables automated identification of monoclonal cell lines

Article 31 May 2021

Lineage tracing meets single-cell omics: opportunities and challenges

Article 31 March 2020

References

Skylaki, S., Hilsenbeck, O. & Schroeder, T. Nat. Biotechnol. 34, 1137–1144 (2016).
Article CAS Google Scholar
Schroeder, T. Nat. Methods 8 (Suppl.), S30–S35 (2011).
Article CAS Google Scholar
Rieger, M.A. & Schroeder, T. Cells Tissues Organs 188, 139–149 (2008).
Article Google Scholar
Filipczyk, A. et al. Nat. Cell Biol. 17, 1235–1246 (2015).
Article CAS Google Scholar
Rieger, M.A., Hoppe, P.S., Smejkal, B.M., Eitelhuber, A.C. & Schroeder, T. Science 325, 217–218 (2009).
Article CAS Google Scholar
Hoppe, P.S. et al. Nature 535, 299–302 (2016).
Article CAS Google Scholar
Bengio, Y., Simard, P. & Frasconi, P. IEEE Trans. Neural Netw. 5, 157–166 (1994).
Article CAS Google Scholar
Graves, A. & Schmidhuber, J. Neural Netw. 18, 602–610 (2005).
Article Google Scholar
Graves, A., Jaitly, N. & Mohamed, A.-r. in 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) 273–278 (IEEE, 2013).
Sandberg, R. Nat. Methods 11, 22–24 (2014).
Article CAS Google Scholar
Hoppe, P.S., Coutu, D.L. & Schroeder, T. Nat. Cell Biol. 16, 919–927 (2014).
Article CAS Google Scholar
Veta, M. et al. Med. Image Anal. 20, 237–248 (2015).
Article Google Scholar
Liu, A.-A., Li, K. & Kanade, T. IEEE Trans. Med. Imaging 31, 359–369 (2012).
Article Google Scholar
Huh, S., Ker, D.F.E., Bise, R., Chen, M. & Kanade, T. IEEE Trans. Med. Imaging 30, 586–596 (2011).
Article Google Scholar
Cohen, A.R., Gomes, F.L.A.F., Roysam, B. & Cayouette, M. Nat. Methods 7, 213–218 (2010).
Article CAS Google Scholar
Liu, A.-A., Li, K. & Kanade, T. in 2010 IEEE International Symposium on Biomedical Imaging: From Nano to Macro 580–583 (IEEE, 2010).
Breiman, L. Mach. Learn. 45, 5–32 (2001).
Article Google Scholar
Winter, M.R. et al. Stem Cell Rep. 5, 609–620 (2015).
Article Google Scholar
Dykstra, B. et al. Proc. Natl. Acad. Sci. USA 103, 8185–8190 (2006).
Article CAS Google Scholar
Lutolf, M.P., Doyonnas, R., Havenstrite, K., Koleckar, K. & Blau, H.M. Integr. Biol. 1, 59–69 (2009).
Article CAS Google Scholar
Osawa, M., Hanada, K.-I., Hamada, H. & Nakauchi, H. Science 273, 242–245 (1996).
Article CAS Google Scholar
Kiel, M.J. et al. Cell 121, 1109–1121 (2005).
Article CAS Google Scholar
Selinummi, J. et al. PLoS One 4, e7497 (2009).
Article Google Scholar
Hilsenbeck, O. et al. Nat. Biotechnol. 34, 703–706 (2016).
Article CAS Google Scholar
Buggenthin, F. et al. BMC Bioinformatics 14, 297 (2013).
Article Google Scholar
Lecun, Y., Bottou, L., Bengio, Y. & Haffner, P. Proc. IEEE 86, 2278–2324 (1998).
Article Google Scholar
Nair, V. & Hinton, G.E. in Proceedings of the 27th International Conference on Machine Learning (ICML-10) 807–814 (ICML, 2010).
Ranzato, M., Huang, F.J., Boureau, Y.-L. & LeCun, Y. in IEEE Conference on Computer Vision and Pattern Recognition, 2007 (CVPR '07) 1–8 (IEEE, 2007).
Ciresan, D.C., Meier, U., Masci, J., Gambardella, L. M. & Schmidhuber, J. in IJCAI Proceedings–International Joint Conference on Artificial Intelligence 22, 1237 (2011).
Ioffe, S. & Szegedy, C. Batch normalization: accelerating deep network training by reducing internal covariate shift. Preprint at https://arxiv.org/abs/1502.03167 (2015).
Glorot, X. & Bengio, Y. in International Conference on Artificial Intelligence and Statistics, 249–256 (2010).
Jia, Y. et al. in Proceedings of the 22nd ACM International Conference on Multimedia 675–678 (ACM, 2014).
Braun, H. & Riedmiller, M. in Proceedings of the International Symposium on Computer and Information Science VII (1992).
Zernike, F. Physica 1, 689–704 (1934).
Article Google Scholar
Smith, K., Carleton, A. & Lepetit, V. in Proceedings of the International Conference on Computer Vision (ICCV) (2009).
Haralick, R.M. IEEE Trans. Syst. Man Cybern. SMC-3, 610–621 (1973).
Article Google Scholar
Gabor, D. J. Instrum. 93, 429–441 (1946).
Google Scholar
Tamura, H., Mori, S. & Yamawaki, T. IEEE Trans. Syst. Man Cybern. 8, 460–473 (1978).
Article Google Scholar
Dalal, N. & Triggs, W. in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR05) 1, 886–893 (IEEE, 2004).
Article Google Scholar
Djuric, N., Lan, L., Vucetic, S. & Wang, Z. J. Mach. Learn. Res. 14, 3813–3817 (2013).
Google Scholar
Vedaldi, A. & Fulkerson, B. VLFeat: An Open and Portable Library of Computer Vision Algorithms. (2008).
Müller, A.C. & Behnke, S. J. Mach. Learn. Res. 15, 2055–2060 (2014).
Google Scholar
Junior, O.L., Delgado, D., Goncalves, V., Nunes, U. & Ludwig, O. in 2009 12th International IEEE Conference on Intelligent Transportation Systems 1–6 (IEEE, 2009).
The Theano Development Team. et al. Theano: a Python framework for fast computation of mathematical expressions. Preprint available at https://arxiv.org/abs/1605.02688 (2016).

Download references

Acknowledgements

We thank S. Pölsterl for comments on the manuscript. This work was supported by the German Federal Ministry of Education and Research (BMBF), the European Research Council starting grant (Latent Causes grant 259294, FJT), the BioSysNet (Bavarian Research Network for Molecular Biosystems, FJT), the German Research Foundation (DFG) within the SPPs 1395 and 1356 to FJT, the UK Medical Research Council (Career Development Fellowship MR/M01536X/1 to FBT), and the Swiss National Science Foundation grant 31003A_156431 and SystemsX IPhD grant 2014/244 to TS.

Author information

Felix Buggenthin and Florian Buettner: These authors contributed equally to this work.

Authors and Affiliations

Institute of Computational Biology, Helmholtz Zentrum München–German Research Center for Environmental Health, Neuherberg, Germany
Felix Buggenthin, Florian Buettner, Manuel Kroiss, Michael Strasser, Michael Schwarzfischer, Fabian J Theis & Carsten Marr
European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
Florian Buettner
Department of Biosystems Science and Engineering, Eidgenössische Technische Hochschule (ETH) Zurich, Basel, Switzerland
Philipp S Hoppe, Max Endele, Dirk Loeffler, Konstantinos D Kokkaliaris, Oliver Hilsenbeck & Timm Schroeder
Research Unit Stem Cell Dynamics, Helmholtz Zentrum München–German Research Center for Environmental Health, Neuherberg, Germany
Philipp S Hoppe, Dirk Loeffler, Konstantinos D Kokkaliaris, Oliver Hilsenbeck & Timm Schroeder
Department of Mathematics, Technische Universität München, Garching, Germany
Manuel Kroiss & Fabian J Theis

Authors

Felix Buggenthin
View author publications
You can also search for this author in PubMed Google Scholar
Florian Buettner
View author publications
You can also search for this author in PubMed Google Scholar
Philipp S Hoppe
View author publications
You can also search for this author in PubMed Google Scholar
Max Endele
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Kroiss
View author publications
You can also search for this author in PubMed Google Scholar
Michael Strasser
View author publications
You can also search for this author in PubMed Google Scholar
Michael Schwarzfischer
View author publications
You can also search for this author in PubMed Google Scholar
Dirk Loeffler
View author publications
You can also search for this author in PubMed Google Scholar
Konstantinos D Kokkaliaris
View author publications
You can also search for this author in PubMed Google Scholar
Oliver Hilsenbeck
View author publications
You can also search for this author in PubMed Google Scholar
Timm Schroeder
View author publications
You can also search for this author in PubMed Google Scholar
Fabian J Theis
View author publications
You can also search for this author in PubMed Google Scholar
Carsten Marr
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

F. Buggentin developed the image processing and machine learning pipeline, tracked cells and analyzed the data. F. Buettner developed the deep neural network approach with M.K., M. Strasser and M. Schwarzfischer contributed to image processing and data analysis. P.S.H. developed and conducted all experiments and tracked cells with M.E.; T.S. developed and supervised data generation. D.L., K.D.K., and O.H. contributed to data generation and analysis. F.J.T. and T.S. initiated the study. C.M. supervised the study with F.J.T., and wrote the manuscript with F.B.U. and F.B.T. All authors commented on the manuscript.

Corresponding authors

Correspondence to Timm Schroeder, Fabian J Theis or Carsten Marr.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Experimental setup.

Primary murine bone marrow cells are extracted from the bone marrow of adult mice. Hematopoietic stem and progenitor cells (HSPCs) are purified by fluorescence activated cell sorting (FACS) and cultivated on plastic with added CD16/32 antibody. Cells are imaged in brightfield and three fluorescence channels for up to 8 days. After the experiment is stopped, single HSPCs and their progeny are manually tracked and annotated.

Supplementary Figure 2 Lineage annotation and dataset overview.

A single HSPC at the root of a genealogy gives rise to an exponentially growing number of successors via repeated cell division. Cells committed to the GM (a) or MegE (b) lineage are called annotated and shown with dark blue and dark red lines, respectively. Cells with no annotated marker onset but annotated sister, daughter, grand-daughter etc. cells are called latent and shown with light blue and light red lines, respectively. (c) Genealogy of cells with no annotated marker onset. (d) All genealogies that were used in this study. (e) Amount of cells in every experiment, subdivided into annotated, latent, and unknown cells. (f) Table of the number of used genealogies, single cells and image patches.

Supplementary Figure 3 Deep neural network performance.

(a) Receiver operating characteristic curves and area under the curve (AUC, 1.0 = perfect classification, 0.5 = random assignment) for each round, evaluated separately for annotated and latent cells. (b) F₁ scores for cells with annotated marker onset (generations 0,1,2) and latent cells (generations -5 to -1).

Supplementary Figure 4 Prospective identification of lineage choice in non-genetically modified mice.

(a) Receiver operating characteristic curves and area under the curve (AUC, 1.0 = perfect classification, 0.5 = random assignment) for each round and for non-genetically modified mice (‘Wild type’), evaluated separately for annotated and latent cells. (b) AUCs determine the performance of the trained models. For non-genetically modified cells (‘Wild type’), annotated cells (generations 0,1,2) and latent cells up to 2 generations before a marker onset (generations -2,-1) show AUCs higher than 0.78. AUC drops 4 generations before marker onset. (c) F₁ scores for non-genetically modified cells (‘Wild type’) with annotated marker onset (generation 0,1,2) and latent cells (-5 to -1).

Supplementary Figure 5 Unknown and latent cells predicted to be either GM or MegE committed express significantly different PU.1-eYFP levels.

Same analysis as described in Figure 2, but for (a) train-test round 2 and (b) train-test round 3. (c) Amount of unknown or latent (white boxes) versus annotated cells (blue and red boxes), summed over all experiments. 2±1% (mean±sd, n=3 rounds) of GM and 15±8% of MegE marker onsets were annotated earlier than four generations after experiment start. Our approach is able to prospectively detect lineage commitment from generation 3 onwards (black arrow), outperforming the conventional manual marker-based approach.

Supplementary Figure 6 Cell cycle averaged patch lineage score.

To improve classification performance of the CNN, we average all patch lineage scores L over the cell cycle, resulting in a cell lineage score <L>. The prediction profile is shown exemplarily for the cell at generation -1 in Fig. 1b.

Supplementary Figure 7 RNN-CNN outperforms feature-based methods with respect to calibration.

(a) Area under the curve (AUC, mean±sd, n=3 rounds) for generations before and after an annotated marker onset for our RNN-CNN (red), two CNNs (AlexNet in blue, LeNet-5 in green), a random forest (yellow), a support vector machine (SVM, gray), and a conditional random field (CRF, white). Our method is on par with the CNN-only implementations and the random forest based classification. It outperforms the SVM and the CRF on annotated cells and most latent generations. (b) F₁ scores (averaged lineage score threshold: 0.5, mean±sd, n=3 rounds) for generations before and after an annotated marker onset for the methods in a and the algorithmic information theoretic prediction (AITP, brown). Due to better calibration, our RNN-CNN shows highest scores and lowest variance in most generations.

Supplementary Figure 8 Importance of morphological features for classification performance.

Gini importance for all 87 morphology features and cellular displacement as reported by the random forest model (mean±sd, n=3 rounds). Cell displacement and simple morphological features (minimum and maximum intensity, perimeter, major axis length) have the most impact on classification performance. More sophisticated shape (Zernike moments, ray features) or texture features (Tamura and Haralick features) are less relevant.

Supplementary Figure 9 Feature differences for predicted cells.

(a) Displacement against generation after experiment start, for cells predicted to become GM and MegE, respectively. GM predicted cells are slightly shifted towards higher displacement for the three train-test rounds. (b) Major axis length of an ellipsoidal fit to the 2D cell body against generation after experiment start, for cells predicted to become GM and MegE, respectively. GM predicted cells are slightly shifted towards higher axis lengths for the three train-test rounds.

Supplementary Figure 10 Quality and robustness of cell identification.

(a) Cells with a size error of 10%-20% (see Supplementary Note 1) show very few outliers and smooth size quantifications (cell taken from the median, solid black line in d). (b) Cells with an error higher than 25% have a few measurement errors (most likely due to under-segmentation), yet the B-spline fit is smooth and the majority of residues is small (cell at 95^th percentile, dashed black line in d). (c) If cell identification is erroneous, a high variance in cell size quantification results in a size error of more than 45% (cell at 99^th percentile, dotted black line in d), which is unlikely for normal cell growth. (d) Fraction of all cells per error in cell size (root mean square error of B-spline fit divided by mean cell size), for each experiment. For less than 5% of all cells in our dataset, the error in cell size was higher than 25% (dashed black line).

Supplementary Figure 11 GUI for correction of marker onset annotations.

(a) Genealogy overview. Annotations can be inspected in every branch (black: unknown, blue: GM, red: MegE) (b) Annotation view. Dropdown menus allow to select different experiments, genealogies and branches from the dataset. After selection, the quantifications for every cell in the branch are loaded. Cell size (first panel) is used to check cell identification quality and to calculate fluorescence concentrations. PU.1-eYFP concentration (second panel) is not used for annotation, whereas GATA1-mCherry (third panel) and CD16/32 antibody concentration (fourth panel) can be used to change the annotation timepoint of a full branch by clicking on the respective frame (red line). The identified single cell in brightfield (upper right image) and its fluorescence signal for the changed channel (lower right image) are shown for visual quality control. Note that high levels of GATA1-mCherry signal at movie start stem from the antibody staining used for HSPC purification by FACS.

Supplementary Figure 12 Image patch gallery of differentiated cells.

100 distinct cells with expressed markers for the GM (a) and MegE (b) lineage, respectively.

Source data

Source data to Fig. 1

Source data to Fig. 2

Rights and permissions

Reprints and permissions

About this article

Cite this article

Buggenthin, F., Buettner, F., Hoppe, P. et al. Prospective identification of hematopoietic lineage choice by deep learning. Nat Methods 14, 403–406 (2017). https://doi.org/10.1038/nmeth.4182

Download citation

Received: 21 October 2015
Accepted: 17 January 2017
Published: 20 February 2017
Issue Date: April 2017
DOI: https://doi.org/10.1038/nmeth.4182

This article is cited by

Deep learning-based predictive classification of functional subpopulations of hematopoietic stem cells and multipotent progenitors
- Shen Wang
- Jianzhong Han
- Jian Huang
Stem Cell Research & Therapy (2024)
Machine learning inference of continuous single-cell state transitions during myoblast differentiation and fusion
- Amit Shakarchy
- Giulia Zarfati
- Assaf Zaritsky
Molecular Systems Biology (2024)
Artificial intelligence in the treatment of cancer: Changing patterns, constraints, and prospects
- Mohammad Ali
- Shahid Ud Din Wani
- Seema Mehdi
Health and Technology (2024)
Label-free identification of protein aggregates using deep learning
- Khalid A. Ibrahim
- Kristin S. Grußmayer
- Aleksandra Radenovic
Nature Communications (2023)
Bridging live-cell imaging and next-generation cancer treatment
- Maria Alieva
- Amber K. L. Wezenaar
- Anne C. Rios
Nature Reviews Cancer (2023)