Abstract
Molecular noise is a natural phenomenon that is inherent to all biological systems1,2. How stochastic processes give rise to the robust outcomes that support tissue homeostasis remains unclear. Here we use single-molecule RNA fluorescent in situ hybridization (smFISH) on mouse stem cells derived from haematopoietic tissue to measure the transcription dynamics of three key genes that encode transcription factors: PU.1 (also known as Spi1), Gata1 and Gata2. We find that infrequent, stochastic bursts of transcription result in the co-expression of these antagonistic transcription factors in the majority of haematopoietic stem and progenitor cells. Moreover, by pairing smFISH with time-lapse microscopy and the analysis of pedigrees, we find that although individual stem-cell clones produce descendants that are in transcriptionally related states—akin to a transcriptional priming phenomenon—the underlying transition dynamics between states are best captured by stochastic and reversible models. As such, a stochastic process can produce cellular behaviours that may be incorrectly inferred to have arisen from deterministic dynamics. We propose a model whereby the intrinsic stochasticity of gene expression facilitates, rather than impedes, the concomitant maintenance of transcriptional plasticity and stem cell robustness.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
All source data used to generate figures are available within the manuscript files or at the GitHub repository (https://github.com/justincwheat/Single-Molecule-Imaging-of-Transcription-Dynamics-in-Somatic-Stem-Cells) associated with this manuscript. Further information and reasonable requests for resources, reagents and data should be directed to the corresponding author. For data used for generating figures related to kin correlation analysis or simulations (Figs. 2, 4, Extended Data Figs. 8 and 9), separate .mat files have been provided as Supplementary Data 1 and also uploaded to the GitHub repository listed above or are generated upon running the associated scripts. All data are available from the corresponding author upon reasonable request. Source data are provided with this paper.
Code availability
Software written for parameter estimation and stochastic simulations are provided in Supplementary Data 2, (FSP.m, getKLD.m, GSSA.m). Software relevant for Figs. 3 and 4 can also be found in Supplementary Data 2: the code for KCA (KCA.m), generating 3-cell frequency matrices (ThreePtFreqs.m), testing different molecular cutoffs (KCA_thresholdtesting.mlx), and calculating time spent in each state (GenerateAllTrees.m). Data structures for each colony are also provided (Colony[#].mat). All scripts and data files have also been published in a publicly available repository at https://github.com/justincwheat/Single-Molecule-Imaging-of-Transcription-Dynamics-in-Somatic-Stem-Cells. All software generated by other groups used in this study are listed in Supplementary Table 7.
References
Levsky, J. M. & Singer, R. H. Gene expression and the myth of the average cell. Trends Cell Biol. 13, 4–6 (2003).
Elowitz, M. B., Levine, A. J., Siggia, E. D. & Swain, P. S. Stochastic gene expression in a single cell. Science 297, 1183–1186 (2002).
Raser, J. M. & O’Shea, E. K. Control of stochasticity in eukaryotic gene expression. Science 304, 1811–1814 (2004).
Bar-Even, A. et al. Noise in protein expression scales with natural protein abundance. Nat. Genet. 38, 636–643 (2006).
Gandhi, S. J., Zenklusen, D., Lionnet, T. & Singer, R. H. Transcription of functionally related constitutive genes is not coordinated. Nat. Struct. Mol. Biol. 18, 27–34 (2011).
Huh, D. & Paulsson, J. Random partitioning of molecules at cell division. Proc. Natl Acad. Sci. USA 108, 15004–15009 (2011).
Lestas, I., Vinnicombe, G. & Paulsson, J. Fundamental limits on the suppression of molecular fluctuations. Nature 467, 174–178 (2010).
Olsson, A. et al. Single-cell analysis of mixed-lineage states leading to a binary cell fate choice. Nature 537, 698–702 (2016).
Tusi, B. K. et al. Population snapshots predict early haematopoietic and erythroid hierarchies. Nature 555, 54–60 (2018).
Femino, A. M., Fay, F. S., Fogarty, K. & Singer, R. H. Visualization of single RNA transcripts in situ. Science 280, 585–590 (1998).
Torre, E. et al. Rare cell detection by single-cell RNA sequencing as guided by single-molecule RNA FISH. Cell Syst. 6, 171–179.e5 (2018).
Chen, K. H., Boettiger, A. N., Moffitt, J. R., Wang, S. & Zhuang, X. Spatially resolved, highly multiplexed RNA profiling in single cells. Science 348, aaa6090 (2015).
Tsanov, N. et al. smiFISH and FISH-quant – a flexible single RNA detection approach with super-resolution capability. Nucleic Acids Res. 44, e165 (2016).
Chen, H. M., Pahl, H. L., Scheibe, R. J., Zhang, D. E. & Tenen, D. G. The Sp1 transcription factor binds the CD11b promoter specifically in myeloid cells in vivo and is essential for myeloid-specific promoter activity. J. Biol. Chem. 268, 8230–8239 (1993).
Koschmieder, S., Rosenbauer, F., Steidl, U., Owens, B. M. & Tenen, D. G. Role of transcription factors C/EBPα and PU.1 in normal hematopoiesis and leukemia. Int. J. Hematol. 81, 368–377 (2005).
Rekhtman, N., Radparvar, F., Evans, T. & Skoultchi, A. I. Direct interaction of hematopoietic transcription factors PU.1 and GATA-1: functional antagonism in erythroid cells. Genes Dev. 13, 1398–1411 (1999).
Zhang, P. et al. PU.1 inhibits GATA-1 function and erythroid differentiation by blocking GATA-1 DNA binding. Blood 96, 2641–2648 (2000).
Rosenbauer, F. et al. Acute myeloid leukemia induced by graded reduction of a lineage-specific transcription factor, PU.1. Nat. Genet. 36, 624–630 (2004).
Steidl, U. et al. Essential role of Jun family transcription factors in PU.1 knockdown-induced leukemic stem cells. Nat. Genet. 38, 1269–1277 (2006).
Will, B. et al. Minimal PU.1 reduction induces a preleukemic state and promotes development of acute myeloid leukemia. Nat. Med. 21, 1172–1181 (2015).
Skinner, S. O. et al. Single-cell analysis of transcription kinetics across the cell cycle. eLife 5, e12175 (2016).
Giladi, A. et al. Single-cell characterization of haematopoietic progenitors and their trajectories in homeostasis and perturbed haematopoiesis. Nat. Cell Biol. 20, 836–846 (2018).
Paul, F. et al. Transcriptional heterogeneity and lineage commitment in myeloid progenitors. Cell 163, 1663–1677 (2015).
Nestorowa, S. et al. A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. Blood 128, e20–e31 (2016).
Tabula Muris Consortium. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature 562, 367–372 (2018).
Chou, S. T. et al. Graded repression of PU.1/Sfpi1 gene transcription by GATA factors regulates hematopoietic cell fate. Blood 114, 983–994 (2009).
Doré, L. C., Chlon, T. M., Brown, C. D., White, K. P. & Crispino, J. D. Chromatin occupancy analysis reveals genome-wide GATA factor switching during hematopoiesis. Blood 119, 3724–3733 (2012).
Grass, J. A. et al. GATA-1-dependent transcriptional repression of GATA-2 via disruption of positive autoregulation and domain-wide chromatin remodeling. Proc. Natl Acad. Sci. USA 100, 8811–8816 (2003).
Singer, Z. S. et al. Dynamic heterogeneity and DNA methylation in embryonic stem cells. Mol. Cell 55, 319–331 (2014).
Gillespie, D. T. A general method of numerically simulating the stochastic time evolution of coupled chemical reactions. J. Comput. Phys. 22, 403–434 (1976).
Haghverdi, L., Büttner, M., Wolf, F. A., Buettner, F. & Theis, F. J. Diffusion pseudotime robustly reconstructs lineage branching. Nat. Methods 13, 845–848 (2016).
Hoppe, P. S. et al. Early myeloid lineage choice is not initiated by random PU.1 to GATA1 protein ratios. Nature 535, 299–302 (2016).
Buggenthin, F. et al. Prospective identification of hematopoietic lineage choice by deep learning. Nat. Methods 14, 403–406 (2017).
Strasser, M. K. et al. Lineage marker synchrony in hematopoietic genealogies refutes the PU.1/GATA1 toggle switch paradigm. Nat. Commun. 9, 2697 (2018).
Arinobu, Y. et al. Reciprocal activation of GATA-1 and PU.1 marks initial specification of hematopoietic stem cells into myeloerythroid and myelolymphoid lineages. Cell Stem Cell 1, 416–427 (2007).
Laslo, P. et al. Multilineage transcriptional priming and determination of alternate hematopoietic cell fates. Cell 126, 755–766 (2006).
Hormoz, S. et al. Inferring cell-state transition dynamics from lineage trees and endpoint single-cell measurements. Cell Syst. 3, 419–433.e8 (2016).
Loeffler, D. et al. Mouse and human HSPC immobilization in liquid culture by CD43- or CD44-antibody coating. Blood 131, 1425–1429 (2018).
La Manno, G. et al. RNA velocity of single cells. Nature 560, 494–498 (2018).
Hilsenbeck, O. et al. Software tools for single-cell tracking and quantification of cellular and molecular properties. Nat. Biotechnol. 34, 703–706 (2016).
Acknowledgements
We thank D. Shechter, K. Gritsman, R. Coleman, J. Biswas, E. Tutucci, M. V. Ugalde and R. Pisczcatowski for discussions; F. Mueller for assistance with FISH-QUANT; M. Elowitz and S. Hormoz for the scripts used for KCA; M. Lopez-Jones for assistance in probe design; D. Loeffler and T. Schroeder for input on time-lapse imaging of HSC; and D. Sun for assistance with flow cytometry and cell sorting. R.H.S. is a senior fellow of the Howard Hughes Medical Institute. A.B is an external professor of the Santa Fe Institute. This research was supported by the Ruth L. Kirschstein National Research Service Award F30GM122308-03 and MSTP training grant T32GM007288-43 to J.C.W., U01DA047729 to R.H.S. and R01CA217092 to U.S. U.S. was supported as a Research Scholar of the Leukemia and Lymphoma Society and is the Diane and Arthur B. Belfer Faculty Scholar in Cancer Research of the Albert Einstein College of Medicine. This work was supported through the Albert Einstein Cancer Center core support grant (P30CA013330), and the Stem Cell Isolation and Xenotransplantation Core Facility (NYSTEM grant #C029154) of the Ruth L. and David S. Gottesman Institute for Stem Cell Research and Regenerative Medicine.
Author information
Authors and Affiliations
Contributions
J.C.W., U.S. and R.H.S. conceptualized the study and designed experiments. J.C.W., A.B. and Y.S. conceptualized mathematical models. J.C.W. performed all experiments and generated all data in the manuscript. J.C.W. performed the mRNA analyses, transcriptional parameter fitting, stochastic simulations, scRNA-seq analyses, and kinship analyses. M.W. provided essential scripts for scRNA-seq analyses. Y.S. and A.B. developed the analyses related to the history of state transitions conditional on pedigree structure. J.C.W. wrote the manuscript and generated all figures and data visualizations. J.C.W., U.S., R.H.S., A.I.S., Y.S., A.B. and M.W. reviewed and edited the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature thanks Thomas Gregor, Ellen Rothenberg and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Transcriptional dynamics of genes conditional on PU.1 state.
a, b, Cumulative distribution function (CDF) of spot intensity (a) and histogram of signal-to-noise ratio (SNR) of spot intensity to local background intensity (b) are shown for all spots that passed intensity and 3D point-spread function (PSF) fit thresholding in FISH-QUANT. c, Probability densities for fluorescence (corresponding to mRNA molecules) in HPC-7 cells for Cy3-, Alexa Fluor 594- and Cy5-labelled readout probes. Insets are XY and XZ average PSFs for each fluorophore. The overlaid line is the fit to a Gaussian distribution. More than 10,000 spots were obtained per fluorophore. d, Representative images three-colour smFISH for PU.1 (Cy5, red), Gata2 (Cy3, white) and Gata1 (AF594, green) in HPC-7 cells. Scale bar, 5 μm. e, Bivariate distributions of Gata1 and Gata2 (left), Gata2 and PU.1 (middle) and PU.1 and Gata1 (right) in two independent experiments (n > 400 cells per experiment) with HPC-7 cells. f, Representative images of multiplexed smFISH between PU.1 and eight other haematopoietic genes in Kit+Lin− bone marrow from wild-type mice (n = 258–2,488 cells for each gene, derived from a single experiment; scale bar, 5 μm). g, Probability distribution for PU.1 mRNA per cell in KL cells from bone marrow from wild-type mice. Overlaid are the high (red) and low (blue) components of the two-component negative binomial distribution fitted to the data. h, Comparison of PU.1 bursting kinetics between high and low states. Left, representative images from smFISH for PU.1 with a single, large transcription site in the nucleus. Middle, frequency of cells with the indicated number of active PU.1 transcription sites. Right, frequency distribution of summed nascent mRNA per cell in each PU.1 state. i, Schematic demonstrating a hypothetical transcriptional phase portrait. j, Phase portraits for each gene based on the PU.1 state of the cell.
Extended Data Fig. 2 Comparative analysis of smFISH and scRNA-seq.
a, CDF plots of mRNA per cell for five scRNA-seq datasets and smFISH. Data are normalized to the maximum count for each gene in each dataset. b, Calculated Gini index for seven transcription factor mRNAs in each scRNA-seq dataset (white through to black) and smFISH (red). c, CDF plots of Gini indices for all five scRNA-seq datasets (See Supplementary Table 2 for gene list). d, Schematic of hierarchical clustering followed by random forest classification to identify important variables for cluster assignment. e, Variable importance plotted against Gini index for four scRNA-seq datasets. The bottom and right panels show marginal distributions of Gini index and variable importance, respectively. f, Plot of average mutual information (top) or average absolute value of the Pearson’s correlation coefficient (bottom) versus normalized abundance of n = 200 randomly selected genes against all other genes in the dataset. The r values listed are the correlation coefficients. See Supplementary Discussion for further details on the analyses performed.
Extended Data Fig. 3 Summary statistics of mRNA copy number for primary KL.
a, Representative images of CMPs, GMPs and MEPs stained by smFISH for PU.1, Gata1 and Gata2. Scale bars, 5 μm. Arrows point to CMPs co-expressing all three mRNAs. b, Boxplots of mRNA count per cell, overlaid with single-cell mRNA values (dots). The pink box is the 95% confidence interval, the red line is the mean expression, the grey box is ±s.e.m. c, Table of summary statistics for each gene. Data for a–c are derived from two experiments (CMPs and MEPs) or a single experiment (GMPs). The sample size is listed in c.
Extended Data Fig. 4 Spot detection in FISH-QUANT and spot calling in T lymphocytes.
a, b, Comparison of raw (a) and filtered (b) smFISH images from CMPs (representative of more than 2 experiments in CMPs; spot quality is consistent with all reported experiments in this manuscript). The insets show line intensity plots; the white line on the cells indicates from where the plots were obtained. Scale bars, 10 μm. c, Average PSF in XY (left columns) and XZ (right columns) for each gene from all detected spots from the CMPs dataset. d, e, Empirical (left) versus theoretical (middle) PSF and residuals (right) in the XY (d) and XZ (e) planes. f, CDFs for all spots passing the initial intensity thresholding for filtered intensity (top row), squared residuals (second row) and width of spots in X, Y, and Z in nanometres (third to fifth rows, respectively). Spots are separated on the basis of those arising from cells with more than five copies of mRNA per cell, between two and five copies per cell, and one copy per cell. Discarded spots that failed 3D fitting are shown in orange. g, mRNA detection in primary CD4+CD8+ thymocytes (n = 136 for Gata1, n = 154 for PU.1).
Extended Data Fig. 5 Gating strategy to assign CMP to states.
a, Representative images of CMPs in different states. Scale bar, 10 μm. b, Gating scheme for assigning CMPs to transcriptional states. See Supplementary Discussion for details on the gating strategy. The t-SNE plot demonstrates the proximity of states to one another and to immunophenotypic GMPs and MEPs. Images and analyses derived from experimental datasets reported in Fig. 1 and Extended Data Fig. 3. c, Frequency distribution of transcriptional bursting for each gene in each transcriptional state. The x axis is the number of active alleles. d, Top, schematic of ‘states’ being the consequence of simple transcriptional noise of the LES state (right) versus truly separate transcriptional states (right) that require transition events (arrows). Bottom, time-dependent behaviour of simulated cells in a noise only (grey) or state transition system (red) shown as a bivariate plot of Gata1 + Gata2 copy number against PU.1 copy number. T indicates the elapsed simulation time as a fraction of the final time. e, f, Gillespie simulations of state transitions, modulating half-life alone. If a transition to another state occurs by noise alone, the cell changes the mRNA half-life of only the mRNA defining that state. e, f, Endpoint states reached in the simulations (n = 10,000) (e) and 1,000 representative simulation trajectories (f), colour-coded on the final endpoint state. Each panel is a different factor change in the mRNA half-life, with the far-left panel as the reference (that is, the half-lives used in Fig. 2), and the other panels showing 2× (second from left), 3× (second from right), and 4× (far-right).
Extended Data Fig. 6 Seventy-two-hour progeny of HSCs.
a, Representative images of HSC progeny. PU.1, red; Gata2, cyan; Gata1, yellow. Transcription sites are demarcated with boxes. Full arrows indicate triple-positive cells, and the arrowhead marks a megakaryocyte. Representative images from two separate experiments. b, CDFs for mRNA counts per HSC progeny. The number of cells with greater than or equal to 1 mRNA per cell is indicated. Two separate experiments, with n values indicated on the graphs. c, Bivariate distributions of PU.1 versus Gata1 (left) and PU.1 versus Gata2 (right).
Extended Data Fig. 7 State assignments for HSC progeny.
a, Gating strategy. Left, removal of megakaryocytes occurs first. Right, cells with more than 10 copies of Gata1 are assigned to G1/2H, whereas cells with more than 150 copies of PU.1 are assigned to macrophage. b, Probability density distributions for PU.1 (left) and Gata2 (right) with overlaid fits for a two-component negative binomial distribution amongst cells after removing megakaryocytes, G1/2H, and macrophage . c, Bivariate distribution of the same cells. Contrary to the case in CMPs, the population of Gata2highPU.1high HSC progeny all had morphological characteristics similar to macrophage-like cells seen in GMP datasets, which also were Gata2highPU.1high (see Extended Data Fig. 3). As such, all cells for which PU.1 > 75 and < 150 were assigned to P1H. d, Probability distribution for Gata2 in the remaining cells, fit with a two-component negative binomial. e, A distribution such as that in d cannot be definitively separated into high and low components owing to overlap in the distributions; therefore, cells are assigned probabilistically during KCA to the G2H or LES state in order to correct for false transitions arising from uncertainty in the assignment. See Supplementary Discussion for more details on the rationale and implementation of probabilistic gating.
Extended Data Fig. 8 HSC colony data.
a, Endpoint cells are the leaves on each pedigree. Note that edge lengths are not scaled on time between divisions, and all endpoint cells are 96 h from the start of the experiment. Cells are colour-coded according to the colour scheme used throughout the manuscript. Megakaryocytes are labelled in orange. Nodes (cells) observed upstream of the endpoint (that is, no transcriptional data are available) are coloured black. b, Histogram of number of progeny from a single HSC. c–e, Proliferation phenotypes of cells based on endpoint state identity (P1H, n = 137; LES, n = 1,571; G1/2H, n = 81; G2H, n = 166). Cell lifetimes in e are the time interval between cell birth (last division) and the next cell division or cell death. Violin plots are normalized to area, with the centre box-and-whisker plots showing the mean (line), standard deviation (box) and 95% confidence interval (whiskers). In e, single dots represent outliers in the 99th percentile.
Extended Data Fig. 9 Robustness of inferred transition matrix to mRNA threshold.
a, Normalized deviation in the inferred transition matrices for each indicated threshold (n = 200 bootstrapping iterations) of Gata1 mRNA per cell relative to the reference matrix reported in this manuscript (cutoff = 10 mRNA per cell). The reference matrix is boxed. For any given transition (that is, matrix entry), the initial states are the columns, final states are rows. The colour code is the same as is used elsewhere in the manuscript. b, As in a for PU.1 (cutoff in manuscript = 75 mRNA per cell). c, Frobenius distance \(\sqrt{\sum _{ij}{({T}_{i,{j}_{{\rm{ref}}}}-{T}_{i,{j}_{{\rm{test}}}})}^{2}}\) between each matrix versus the reference transition matrix. The solid black line indicates the background Frobenius distance derived from statistical uncertainty in the reference transition matrix, derived by bootstrapping through the analysis n = 1,000 times and picking random transition rates from a Gaussian distribution defined by inferred mean and standard deviation of the transition matrix. Frobenius distance values above this line significantly differ from the matrix reported in the manuscript.
Extended Data Fig. 10 Analysis of mRNA partitioning errors.
a, Representative image of a CMP in late anaphase. b, mRNA copy number in each sister cell in CMPs (n = 52) and HSCs (n = 46). r is the Pearson’s correlation coefficient for sister-cell mRNA copy number; the red dashed line is y = x. c, Correlation of mRNA levels between HSCs that divided within the last 1 h (n = 171). Pearson’s correlation coefficients (r) for each gene are listed.
Supplementary information
Supplementary Information
This file contains Supplementary Methods Sections 1-6, Supplementary Discussion Sections 1-3, Supplementary Figures 1-2 and Supplementary References.
Supplementary Table 1
Oligonucleotide Sequences for smFISH probes. For genes detected with two step smFISH, the appropriate readout probes are listed.
Supplementary Table 2
Gene lists for scRNAseq analyses. Gene names for all gene sets analyzed in each scRNAseq dataset.
Supplementary Table 3
GO terms (ranked by enrichment score) and top decile of genes by VI for scRNAseq analysis. Top decile of genes ranked by Variable importance from each scRNAseq dataset analyzed. Associated GO terms, ranked by enrichment score, for those top decile genes.
Supplementary Table 4
Inferred transcriptional parameters for CMP data. kon: probability of gene turning on; koff: probability of an ON gene turning off; kini: while in the on state, probability of RP2 initiation event; kd: decay rate of the mRNA.
Supplementary Table 5
List of antibodies used in this study.
Supplementary Table 6
Reagents used in the experiments used in this study.
Supplementary Table 7
List of software and associated URLs for download of software.
Supplementary Data 1
This zipped file includes: ColonyIDs.mat - Names of all colonies for KCA; Bdry.mat - Number of cells in each colony; Colony_[1:117].mat - 117 data frames containing all data necessary to perform KCA; KLdatamatrix.mat - KL progenitor data; KCA_datamatrix.mat - Collated Data matrix of all cells used in KCA.
Supplementary Data 2
This zipped file includes: FSP.m - Use this software for parameter inference based on best fit to the burst frequency and mRNA count/cell; getKLD.m - Used by FSP.m; markovBackTrace.m - Can be used to determine number of visits to state j given a cell is in state i at time t; GSSA.m - Stochastic simulations; treeBackTrace2.m - used by GenerateAllTrees.m; GenerateAllTrees.m - Generates pedigree maps of all colonies and also calculates time spent in each state; KCA.m - Generates frequency of states conditional on the presence of a state in the colony and runs the inference for transitions probabilities; ThreePtFreqs.m - 3-cell state frequency test; KCA_thresholdtesting.mlx - Used for testing different mRNA cutoff values for KCA and comparing to the reference matrix.
Source data
Rights and permissions
About this article
Cite this article
Wheat, J.C., Sella, Y., Willcockson, M. et al. Single-molecule imaging of transcription dynamics in somatic stem cells. Nature 583, 431–436 (2020). https://doi.org/10.1038/s41586-020-2432-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41586-020-2432-4