Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

A systems biology pipeline identifies regulatory networks for stem cell engineering

An Author Correction to this article was published on 16 July 2019

This article has been updated


A major challenge for stem cell engineering is achieving a holistic understanding of the molecular networks and biological processes governing cell differentiation. To address this challenge, we describe a computational approach that combines gene expression analysis, previous knowledge from proteomic pathway informatics and cell signaling models to delineate key transitional states of differentiating cells at high resolution. Our network models connect sparse gene signatures with corresponding, yet disparate, biological processes to uncover molecular mechanisms governing cell fate transitions. This approach builds on our earlier CellNet and recent trajectory-defining algorithms, as illustrated by our analysis of hematopoietic specification along the erythroid lineage, which reveals a role for the EGF receptor family member, ErbB4, as an important mediator of blood development. We experimentally validate this prediction and perturb the pathway to improve erythroid maturation from human pluripotent stem cells. These results exploit an integrative systems perspective to identify new regulatory processes and nodes useful in cell engineering.

This is a preview of subscription content, access via your institution

Access options

Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: GRN dynamics capture cell fate specification.
Fig. 2: ErbB signaling is implicated in erythroid differentiation.
Fig. 3: ErbB4 is required for robust erythroid development.
Fig. 4: ErbB4 genetic deficiency leads to blood defects in the ErbB4−/−HER4heart mouse model.
Fig. 5: Modulation of pathways downstream of ErbB signaling augments iPSC-derived RBC generation.

Data availability

All RNA-seq data have been deposited to the GEO database under GSE108128.

Change history

  • 16 July 2019

    In the version of this article initially published, the second NIH grant “R24-DK49216” to author George Q. Daley contained an error. The grant number should have read U54DK110805. The error has been corrected in the HTML and PDF versions of the article.


  1. Westerhoff, H. V. & Palsson, B. O. The evolution of molecular biology into systems biology. Nat. Biotechnol. 22, 1249–1252 (2004).

    CAS  Article  Google Scholar 

  2. Morris, S. A. & Daley, G. Q. A blueprint for engineering cell fate: current technologies to reprogram cell identity. Cell Res. 23, 33–48 (2013).

    CAS  Article  Google Scholar 

  3. Hanna, J. H., Saha, K. & Jaenisch, R. Pluripotency and cellular reprogramming: facts, hypotheses, unresolved issues. Cell 143, 508–525 (2010).

    CAS  Article  Google Scholar 

  4. Cahan, P. et al. CellNet: network biology applied to stem cell engineering. Cell 158, 903–915 (2014).

    CAS  Article  Google Scholar 

  5. Morris, S. A. et al. Dissecting engineered cell types and enhancing cell fate conversion via CellNet. Cell 158, 889–902 (2014).

    CAS  Article  Google Scholar 

  6. Radley, A. H. et al. Assessment of engineered cells using CellNet and RNA-seq. Nat. Protoc. 12, 1089–1102 (2017).

    Article  Google Scholar 

  7. Setty, M. et al. Wishbone identifies bifurcating developmental trajectories from single-cell data. Nat. Biotechnol. 34, 637–645 (2016).

    CAS  Article  Google Scholar 

  8. Shin, J. et al. Single-Cell RNA-Seq with waterfall reveals molecular cascades underlying adult neurogenesis. Cell Stem Cell 17, 360–372 (2015).

    CAS  Article  Google Scholar 

  9. Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 32, 1–11 (2014).

    Article  Google Scholar 

  10. Lummertz da Rocha, E. et al. Reconstruction of complex single-cell trajectories using CellRouter. Nat. Commun. 9, 892 (2018).

    Article  Google Scholar 

  11. Dzierzak, E. & Philipsen, S. Erythropoiesis: development and differentiation. Cold Spring Harb Perspect. Med. 3, a011601–a011601 (2013).

    Article  Google Scholar 

  12. Tsai, F. Y. & Orkin, S. H. Transcription factor GATA-2 is required for proliferation/survival of early hematopoietic cells and mast cell formation, but not for erythroid and myeloid terminal differentiation. Blood 89, 3636–3643 (1997).

    CAS  PubMed  Google Scholar 

  13. Cantor, A. B. & Orkin, S. H. Transcriptional regulation of erythropoiesis: an affair involving multiple partners. Oncogene 21, 3368–3376 (2002).

    CAS  Article  Google Scholar 

  14. da Cunha, A. F. et al. Global gene expression reveals a set of new genes involved in the modification of cells during erythroid differentiation. Cell Prolif. 43, 297–309 (2010).

    Article  Google Scholar 

  15. Ding, K. et al. Genetic Loci implicated in erythroid differentiation and cell cycle regulation are associated with red blood cell traits. Mayo Clin. Proc. 87, 461–474 (2012).

    CAS  Article  Google Scholar 

  16. Li, J. et al. Isolation and transcriptome analyses of human erythroid progenitors: BFU-E and CFU-E. Blood 124, 3636–3645 (2014).

    CAS  Article  Google Scholar 

  17. Rylski, M. et al. GATA-1-mediated proliferation arrest during erythroid maturation. Mol. Cell Biol. 23, 5031–5042 (2003).

    CAS  Article  Google Scholar 

  18. Goh, S.-H. et al. The human reticulocyte transcriptome. Physiol. Genomics 30, 172–178 (2007).

    CAS  Article  Google Scholar 

  19. Ideker, T., Ozier, O., Schwikowski, B. & Siegel, A. F. Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics 18, S233–S240 (2002).

    Article  Google Scholar 

  20. Langfelder, P., Mischel, P. S. & Horvath, S. When is hub gene selection better than standard meta-analysis? PLoS ONE 8, e61505 (2013).

    CAS  Article  Google Scholar 

  21. Barabási, A.-L., Gulbahce, N. & Loscalzo, J. Network medicine: a network-based approach to human disease. Nat. Rev. Genet. 12, 56–68 (2011).

    Article  Google Scholar 

  22. Takahashi, K. et al. Induction of pluripotent stem cells from adult human fibroblasts by defined factors. Cell 131, 861–872 (2007).

    CAS  Article  Google Scholar 

  23. Yu, J. et al. Induced pluripotent stem cell lines derived from human somatic cells. Science 318, 1917–1920 (2007).

    CAS  Article  Google Scholar 

  24. Tuncbag, N. et al. Simultaneous reconstruction of multiple signaling pathways via the prize-collecting steiner forest problem. J. Comput. Biol. 20, 124–136 (2013).

    CAS  Article  Google Scholar 

  25. Robert-Moreno, A. et al. The notch pathway positively regulates programmed cell death during erythroid differentiation. Leukemia 21, 1496–1503 (2007).

    CAS  Article  Google Scholar 

  26. Watanabe, S. et al. Loss of a Rho-regulated actin nucleator, mDia2, impairs cytokinesis during mouse fetal erythropoiesis. Cell Rep. 5, 926–932 (2013).

    CAS  Article  Google Scholar 

  27. Tanno, T. et al. High levels of GDF15 in thalassemia suppress expression of the iron regulatory protein hepcidin. Nat. Med. 13, 1096–1101 (2007).

    CAS  Article  Google Scholar 

  28. Jacquel, A. et al. Apoptosis and erythroid differentiation triggered by Bcr-Abl inhibitors in CML cell lines are fully distinguishable processes that exhibit different sensitivity to caspase inhibition. Oncogene 26, 2445–2458 (2007).

    CAS  Article  Google Scholar 

  29. Zhou, B. P. et al. HER-2/neu induces p53 ubiquitination via Akt-mediated MDM2 phosphorylation. Nat. Cell Biol. 3, 973–982 (2001).

    CAS  Article  Google Scholar 

  30. Le, X. F. et al. Heregulin-induced apoptosis is mediated by down-regulation of Bcl-2 and activation of caspase-7 and is potentiated by impairment of protein kinase C alpha activity. Oncogene 20, 8258–8269 (2001).

    CAS  Article  Google Scholar 

  31. Holbro, T. et al. The ErbB2/ErbB3 heterodimer functions as an oncogenic unit: ErbB2 requires ErbB3 to drive breast tumor cell proliferation. Proc. Natl Acad. Sci. USA 100, 8933–8938 (2003).

    CAS  Article  Google Scholar 

  32. Lee, H.-Y. et al. PPAR-α and glucocorticoid receptor synergize to promote erythroid progenitor self-renewal. Nature 522, 474–477 (2015).

    CAS  Article  Google Scholar 

  33. Orkin, S. H. & Zon, L. I. Hematopoiesis: an evolving paradigm for stem cell biology. Cell 132, 631–644 (2008).

    CAS  Article  Google Scholar 

  34. Tidcombe, H. et al. Neural and mammary gland defects in ErbB4 knockout mice genetically rescued from embryonic lethality. Proc. Natl Acad. Sci. USA 100, 8281–8286 (2003).

    CAS  Article  Google Scholar 

  35. Doulatov, S. et al. Induction of multipotential hematopoietic progenitors from human pluripotent stem cells via respecification of lineage-restricted precursors. Cell Stem Cell 13, 459–470 (2013).

    CAS  Article  Google Scholar 

  36. Naresh, A. et al. The ERBB4/HER4 intracellular domain 4ICD is a BH3-only protein promoting apoptosis of breast cancer cells. Cancer Res. 66, 6412–6420 (2006).

    CAS  Article  Google Scholar 

  37. Bersell, K., Arab, S., Haring, B. & Kühn, B. Neuregulin1/ErbB4 signaling induces cardiomyocyte proliferation and repair of heart injury. Cell 138, 257–270 (2009).

    CAS  Article  Google Scholar 

  38. Li, B., Woo, R.-S., Mei, L. & Malinow, R. The neuregulin-1 receptor erbB4 controls glutamatergic synapse maturation and plasticity. Neuron 54, 583–597 (2007).

    CAS  Article  Google Scholar 

  39. Hahn, C.-G. et al. Altered neuregulin 1-erbB4 signaling contributes to NMDA receptor hypofunction in schizophrenia. Nat. Med. 12, 824–828 (2006).

    CAS  Article  Google Scholar 

  40. Flanagan, R. J. & Dunk, L. Haematological toxicity of drugs used in psychiatry. Hum. Psychopharmacol. 23, 27–41 (2008).

    CAS  Article  Google Scholar 

  41. Hänggi, P. et al. Functional plasticity of the N-methyl-d-aspartate receptor in differentiating human erythroid precursor cells. Am. J. Physiol., Cell Physiol. 308, C993–C1007 (2015).

    Article  Google Scholar 

  42. Kwan, W. et al. The central nervous system regulates embryonic HSPC production via stress-responsive glucocorticoid receptor signaling. Cell Stem Cell 19, 370–382 (2016).

    CAS  Article  Google Scholar 

  43. Kuramochi, Y. et al. Cardiac endothelial cells regulate reactive oxygen species-induced cardiomyocyte apoptosis through neuregulin-1beta/erbB4 signaling. J. Biol. Chem. 279, 51141–51147 (2004).

    CAS  Article  Google Scholar 

  44. Xu, Z. et al. Neuroprotection by neuregulin-1 following focal stroke is associated with the attenuation of ischemia-induced pro-inflammatory and stress gene expression. Neurobiol. Dis. 19, 461–470 (2005).

    CAS  Article  Google Scholar 

  45. van der Harst, P. et al. Seventy-five genetic loci influencing the human red blood cell. Nature 492, 369–375 (2012).

    Article  Google Scholar 

  46. Ludwig, T. E. et al. Derivation of human embryonic stem cells in defined conditions. Nat. Biotechnol. 24, 185–187 (2006).

    CAS  Article  Google Scholar 

  47. Sugimura, R. et al. Haematopoietic stem and progenitor cells from human pluripotent stem cells. Nature 545, 432–438 (2017).

    CAS  Article  Google Scholar 

  48. Lis, R. et al. Conversion of adult endothelium to immunocompetent haematopoietic stem cells. Nature 545, 439–445 (2017).

    CAS  Article  Google Scholar 

  49. Kobayashi, H. et al. Angiocrine factors from Akt-activated endothelial cells balance self-renewal and differentiation of haematopoietic stem cells. Nat. Cell Biol. 12, 1046–1056 (2010).

    CAS  Article  Google Scholar 

  50. Shukla, S. et al. Progenitor T-cell differentiation from hematopoietic stem cells using Delta-like-4 and VCAM-1. Nat. Methods 14, 531–538 (2017).

    CAS  Article  Google Scholar 

  51. Sato, T. et al. Neuregulin 1 type II-ErbB signaling promotes cell divisions generating neurons from neural progenitor cells in the developing zebrafish brain. PLoS ONE 10, e0127360 (2015).

    Article  Google Scholar 

  52. Faith, J. J. et al. Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol 5, e8 (2007).

    Article  Google Scholar 

  53. Rosvall, M. & Bergstrom, C. T. An information-theoretic framework for resolving community structure in complex networks. Proc. Natl Acad. Sci. USA 104, 7327–7331 (2007).

    CAS  Article  Google Scholar 

  54. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).

    CAS  Article  Google Scholar 

  55. Liberzon, A. et al. The molecular signatures database hallmark gene set collection. Cell Syst 1, 417–425 (2015).

    CAS  Article  Google Scholar 

  56. Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).

    CAS  Article  Google Scholar 

  57. Gene Ontology Consortium. Gene Ontology Consortium: going forward. Nucleic Acids Res. 43, D1049–D1056 (2015).

    Article  Google Scholar 

  58. Tibishirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288 (1996).

    Google Scholar 

  59. Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010).

    Article  Google Scholar 

  60. Lê Cao, K.-A., Boitard, S. & Besse, P. Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems. BMC Bioinformatics 12, 253 (2011).

    Article  Google Scholar 

  61. Snel, B., Lehmann, G., Bork, P. & Huynen, M. A. STRING: a web-server to retrieve and display the repeatedly occurring neighbourhood of a gene. Nucleic Acids Res. 28, 3442–3444 (2000).

    CAS  Article  Google Scholar 

  62. Szklarczyk, D. et al. The STRING database in 2017: quality-controlled protein–protein association networks, made broadly accessible. Nucleic Acids Res. 45, D362–D368 (2017).

    CAS  Article  Google Scholar 

  63. Akhmedov, M. et al. PCSF: an R-package for network-based interpretation of high-throughput data. PLoS Comput. Biol. 13, e1005694 (2017).

    Article  Google Scholar 

  64. Chen, E. Y. et al. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics 14, 128 (2013).

    Article  Google Scholar 

  65. Vo, L. T. et al. Regulation of embryonic haematopoietic multipotency by EZH1. Nature 553, 506–510 (2018).

    CAS  Article  Google Scholar 

  66. Park, I.-H. et al. Reprogramming of human somatic cells to pluripotency with defined factors. Nature 451, 141–146 (2008).

    CAS  Article  Google Scholar 

  67. Chadwick, K. et al. Cytokines and BMP-4 promote hematopoietic differentiation of human embryonic stem cells. Blood 102, 906–915 (2003).

    CAS  Article  Google Scholar 

  68. Giarratana, M.-C. et al. Proof of principle for transfusion of in vitro-generated red blood cells. Blood 118, 5071–5079 (2011).

    CAS  Article  Google Scholar 

  69. North, T. E. et al. Prostaglandin E2 regulates vertebrate haematopoietic stem cell homeostasis. Nature 447, 1007–1011 (2007).

    CAS  Article  Google Scholar 

  70. Doulatov, S. et al. Drug discovery for Diamond-blackfan anemia using reprogrammed hematopoietic progenitors. Sci. Transl. Med. 9, eaah5645 (2017).

    Article  Google Scholar 

Download references


The authors thank G. Corfas (University of Michigan) for sharing the ErbB4−/− HER4heart mutant mice, which were generated in 2003 by M. Gassmann (University of Basel) and colleagues34 and L.I. Zon (Boston Children’s Hospital) for the globin:eGFP transgenic fish. The authors also thank P. Eser for the ErbB inhibitor library and T. Rosanwo for cells and reagents, as well as R. Mathieu and the BCH Flow Cytometry Core, J. Osborne, B. Joughin, J. Das and A. Zweemer for technical advice. This work is supported by grants to G.Q.D. from the NIH National Institute of Diabetes and Digestive and Kidney Diseases (nos. R24-DK092760, U54DK110805) and National Heart, Lung, and Blood Institute (Progenitor Cell Translation Consortium, no. U01HL134812; and nos. R01-HL04880 and NIH R24-OD017870-01). Additional support was given to D.A.L. from NIH National Institute of General Medical Sciences (no. R01-GM069668). M.A.K. is supported by a NIH T32 Training Grant from BWH Hematology. L.T.V. was supported by the NSF Graduate Research Fellowship. J.M.F. is supported by a NIH T32 Training Grant from the NHLBI. T.E.N. is a Leukemia and Lymphoma Society Scholar. G.Q.D. is an associate member of the Broad Institute and was supported by the Howard Hughes Medical Institute and the Manton Center for Orphan Disease Research.

Author information

Authors and Affiliations



M.A.K., D.A.L. and G.Q.D. conceived the project. M.A.K., L.T.V., J.M.F., J.B., A.J.C. and S.L. performed experimental work and data interpretation. K.-K.W., J.J.C., P.C., T.E.N., D.A.L. and G.Q.D. supervised research and participated in project planning. M.A.K., T.E.N., D.A.L. and G.Q.D. prepared the manuscript.

Corresponding authors

Correspondence to Douglas A. Lauffenburger or George Q. Daley.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 CellNet classification of erythroid microarrays.

(a) Classification probability of hspc and erythroid microarrays using the original release of human CellNet. (b) CellNet classification of HSPC (n=127) and erythroid (n=164) microarrays by Random Forest (partitioned at 50% training/testing, 2-fold cross-validation) and classifier specificity (false positive rate of 5%) after training an erythroid GRN. (c) Status of the HSPC and erythroid GRNs across all cell types. GATA2 and GATA1 (d) expression and (e) first order networks associated with the HSPC (blue) and erythroid (red) GRNs, with overlapping genes annotated in green. (f) Enriched biological processes (gene ontology) within the entire erythroid GRN (235 genes) and the first order networks associated with each of the erythroid TFs. (g) overlap between cell and tissue specific GRNs (h) Classification probability (i) GRN establishment and (j) network influence score (NIS) for a subset of microarrays derived from sorted populations corresponding to temporal stages of differentiation (CFU-E, ProE, IntE, LateE). For this analysis, the classified arrays were excluded from the training data.

Supplementary Figure 2 Analysis of principal component characteristics and gene expression dynamics.

GSEA enrichment for Hallmark gene sets was calculated for each sample as a preranked list with respect to the average of the entire erythroid dataset. The correlation of the GSEA normalized enrichment scores (NES) with the principal component coordinates were ranked in order to determine the relative influence of gene sets to (a) PC1 and (b) PC2. (c) Distribution of regulators (i.e. TFs, DNA binding factors) within gene clusters in (c): G1 (green), G2 (blue) and G3 (red) and the distribution of target-regulator interactions across gene clusters (G1-G3). (d-g) The importance of genes within the GRN to each GMM cluster (C2, C4, C5 and C6 from Fig. 2a) were determined by calculating the dot product of the score cluster centroid with the coordinates of each gene on the PCA loadings plot (Fig. 2c). The top 20 genes corresponding to clusters C2 (d), C4 (e), C5 (f) and C6 (g) are plotted, with the full datasets available in Supplementary Table 6. (h) K-means clustering of significantly modulated genes across stages of differentiation (derived from clusters C2, C4, C5 and C6 from the PCA analysis in Fig. 2a). The number of genes in each cluster is denoted in the upper right corner and individual clusters are annotated with significantly enriched (Bonferroni corrected p < 0.05) gene ontology (GO) biological processes.

Supplementary Figure 3 Dissection of GRN loadings in RNA-seq datasets.

(a) Principal component scores plot and Gaussian Mixture Model (GMM) unsupervised clustering (S1-S4) of a RNA-seq dataset describing developmentally staged, purified populations of erythroblasts (GSE53983). The identity of samples in each cluster is represented in the graphical overlay. (b) Correlations between gene loadings from the erythroid GRN in microarray (Fig. 1d) and RNA-seq data * = p < 0.05. The loading distances were calculated as the Euclidean distance to each GMM cluster centroid. C2, C4, C5 and C6 represent the GMM clusters from microarray data and S1-S4 are the GMM clusters from RNA-seq data. (c) Representative loading distances comparing microarray (C2 and C5; x-axis) to RNA-seq (S1 and S3; y-axis) data. (d) The top 30 genes corresponding to each RNA-seq cluster, represented as the normalized Euclidean distance to the cluster centroid. The bar colors correspond to the gene cluster membership (G1-G3; Fig. 1d) from microarray classification.

Supplementary Figure 4 GRN dynamics of the T cell network and signaling during activation.

(a) Principal Component Analysis (PCA) scores of T cell microarrays, with Gaussian Mixture Model (GMM) derived clusters C1-C7. (b) Spatial organization of manual classifications and 95% confidence interval ellipses in PCA space, with PC1 related to treatments (HDAC inhibition, activation, resting, naïve) and PC2 related to cell type (CD4, CD8, NKT). Peripheral blood (PB) derived samples are shown with a solid line, and cord blood (CB) derived samples with a dashed line. (c) Distribution of manual classifications across unsupervised computational clusters C3 (primarily naïve PB CD4), C4 (PB CD8), C1 (primarily activated CB CD4) and C2 (activated PB CD4) from part (a). (d) Distance between loadings (genes) and cluster centroids, calculated as the dot product of their coordinates, with the top 5 genes shown for clusters C3, C4, C1 and C2. (e) Visualization of regulators from dynamic networks calculated using CLR inference across each of the clusters from C3, C4, C1 and C2. Node size correlates with the degree (number of targets) and line width corresponds to the CLR Z-score (confidence of interaction) between regulators. (f) Gene signature distinguishing clusters C3 (naïve PB CD4; green) and C2 (activated PB CD4; blue) from (a), as determined by Least Absolute Shrinkage and Selection Operator (LASSO). (g) Protein-protein interaction (PPI) network built from gene signature using the STRING database and the Prize Collecting Steiner Tree (PCSF) algorithm. PCSF parameters were ω=3, β=2, µ=8x10-5. Non-LASSO nodes (Steiner nodes) are depicted with the size proportional to the degree and LASSO nodes (terminals) are depicted in green and blue, corresponding to the representation in part (f). (h) P-values ranking node enrichment (Fisher’s test for connections in the LASSO network relative to the full STRING network) and corresponding Gene Ontology annotations. (i) Enriched signaling pathways from the Reactome database. (j) Coexpression network comprising genes highly correlated r > |0.86| with the LASSO signature. (k) Enrichment analyses for kinase perturbation and ligand regulation (both from LINCS L1000).

Supplementary Figure 5 Reticulocyte microarray characterization.

(a) Pearson’s correlation comparing all genes on microarrays from distinct stages of differentiation (corresponding to C2, C4, C5 and C6 from Fig. 2a). (b) Gene Ontology (GO) enrichment for biological processes that are differentially regulated between C5 and C6. (c) All genes within representative GO categories with coordinated decreased (blue) or increased (red) expression during erythroid maturation and reticulocytosis.

Supplementary Figure 6 PLSDA validation of LASSO genes.

(a) Lambda was chosen for LASSO regression based upon the minimum mean square error (MSE). The number of genes corresponding to each value of lambda are annotated above the graph. (b) PLSDA model built upon the 27 gene LASSO signature, with validation via comparison to PLSDA models with the (c) Y block (binary classification) randomly permuted and (d) based on random selections (1000 permutations) of 27 genes.

Supplementary Figure 7 Transcriptional regulatory network of hemoglobin genes.

(a) First order connection network derived from connecting genes corresponding to the hemoglobin metabolism biological process (GO: 0020027) within the CellNet global GRN. Significantly enriched regulators (p<0.05 by Fisher’s test for connections in the network compared to the global GRN) are shown in black, with community-derived modules depicted above in varying colors and the hemoglobin targets in green. (b) The degree of all regulators and hemoglobin target gene module membership. (e) Graphical representation of Module 1 with significantly enriched regulators (black) and a ranked list of the most enriched regulators by p-value (Fisher’s test).

Supplementary Figure 8 Signaling network parameter robustness analysis.

(a-b) Network characteristics upon varying the Prize Collecting Steiner Forest (PCSF) parameters, including (a) the degree penalty, µ and (b) the number of trees, ω and node prize scaling, β. The degree centrality was used to assess the changes in µ, demonstrating the decreasing influence of highly connected regulators at increasing values of µ for 100 permutations of random networks, as well as the LASSO network (red). The fraction of LASSO nodes (top) and the total node size (bottom) as a function of β and ω demonstrate a global maximum after an empirically determined threshold. Accordingly, the network composition changes with increasing β and ω, with decreasing density and centralization, yet increasing significant Gene Ontology (GO) terms at p<0.05 in larger networks. (c) Example of a highly centralized network with µ = 0 and examples of small (d) and medium (e) sized networks. The Steiner node (non-LASSO terminal) size is represented in proportion to the connectivity degree and LASSO terminals are shown in red and blue, corresponding to the representation in Fig. 2a. (f) Comparison of the maximum network betweenness and the P53 node betweenness, when comparing random networks (100 permutations) to the LASSO network, demonstrating that P53 is has a significantly (p<0.001) more centralized role in the topology of the LASSO network than would be predicted randomly.

Supplementary Figure 9 Correlation network threshold robustness analysis.

(a) Network topology upon varying correlation thresholds (absolute value of Pearson’s r) for the co-expression network from Fig. 2i-j. Insets show the limited range from correlation cutoffs from r > |0.8| to r = |1|. (b) Representative networks across values from r > |0.92| to r > |0.8| represented as heatmaps of the Pearson’s r value. Predictions of enriched (c) transcription factors (ChEA/ENCODE), (d) kinase perturbations (downregulated genes upon kinase knockdown from LINCS L1000) and (e) ligand regulation (upregulated upon ligand stimulation from LINCS L1000) associated with each correlated gene set.

Supplementary Figure 10 ErbB4 expression in primary human bone marrow and during in vitro RBC differentiation.

(a) Temporal morphological analysis of erythroid cells differentiated from BM CD34+ cells and corresponding ErbB4 gene expression relative to MCF7 cells. n=4 independent samples from 2 experiments; p=0.03 by Kruskal-Wallis rank sum test (unequal variances assumed based upon Levene’s test p=0.005). (b) ERRB4 gene expression (relative to MCF7 cells) in sorted populations from primary human bone marrow corresponding to early (CD71+GlyA-), intermediate (CD71+GlyA+) and late (CD71-GlyA+) stages of erythropoiesis. (c) Raw microarray gene expression of the ErbB family of receptor tyrosine kinases across stages of differentiation (clusters C2, C4, C5 and C6 from Fig. 2a). * = p<0.05 compared to all other clusters via ANOVA and post hoc Tukey HSD tests.

Supplementary Figure 11 Peripheral blood composition in Neratinib treated mice.

Percentages of reticulocytes, neutrophils, lymphocytes and monocytes in the peripheral blood of mice treated with vehicle (hydroxypropyl methylcellulose; HPMC) or Neratinib (60 mg/kg) for 7 days. * = p<0.05 by unpaired, two-sided t-test.

Supplementary Figure 12 Zebrafish phenotypes with ErbB4 morpholinos.

(a) gata1:dsred and (b) mpo:gfp transgenic zebrafish lines, demonstrating the relatively proportion of erythroid and neutrophil phenotypes, respectively. ErbB4 morpholinos were injected at 0 hours post fertilization (hpf) and analysis was conducted at 48 hpf. n > 5 per condition, across at least 2 clutches. ** = p < 0.01 by two sided, unpaired t-test.

Supplementary Figure 13 Blood defects in ErbB4-/-HER4heart mice.

(a) Bone marrow profiles (CD71 vs Ter119) of wild type, heterozygous, and homozygous ErbB4 knockouts, with quantification of each gate I-IV in (b). (c) Peripheral blood complete blood count results for RBC parameters such as Cellular Hemoglobin Concentration Mean (CHCM) and hemoglobin distribution width (HDW). (d) Quantification and images of gross spleen anatomy and weight. (a), (b), and (d): ErbB4+/+ and ErbB4-/- n=3; ErbB4+/- n=4. (c): ErbB4+/+ and ErbB4-/- n=8; ErbB4+/- n=17. * p < 0.05, ** p < 0.01, *** p < 0.001 by one-way ANOVA with post hoc Tukey HSD.

Supplementary Figure 14 Expression of ErbB4 targets upon inhibitor treatment.

(a) GSEA analysis of targets of ErbB4 CYT-1, ErbB4 CYT-2, and common targets3,4,28 upon treatment with Lapatinib (increased affinities for EGFR and HER2), as well as Neratinib, Dacomitinib, and Afatinib (pan ErbB inhibitors) for 24 hours. (b) GSEA plots of ErbB4 CYT-2 targets. (c) Expression of genes within the erythroid GRN (GMM clusters G1, G2, G3 from Fig. 2c) compared to all genes. * = p<0.05, indicating differential probability distributions via Kolmogorov-Smirnov (KS) test.

Supplementary Figure 15 Flow cytometry gating schemes.

Gating schemes of representative flow cytometry plots (acquired on BD Fortessa cytometer and analyzed in FlowJo software) for (a) human bone marrow (BM) CD34+ differentiation (day 17) and (b) native mouse BM. Gating was performed by identifying cells on the FSC/SSC plots, excluding dead cells (DAPI+) and gating for GlyA/CD71 (human) and Ter119/CD71 (mouse). In all cases, gates were set based upon unstained controls and compensated with automated compensation with anti-mouse Igk and negative beads.

Supplementary information

Supplementary Information

Supplementary Figs. 1–15

Reporting Summary

Supplementary Tables

Supplementary Tables 1–13

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kinney, M.A., Vo, L.T., Frame, J.M. et al. A systems biology pipeline identifies regulatory networks for stem cell engineering. Nat Biotechnol 37, 810–818 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing