Mammalian gene expression is inherently stochastic1,2, and results in discrete bursts of RNA molecules that are synthesized from each allele3,4,5,6,7. Although transcription is known to be regulated by promoters and enhancers, it is unclear how cis-regulatory sequences encode transcriptional burst kinetics. Characterization of transcriptional bursting, including the burst size and frequency, has mainly relied on live-cell4,6,8 or single-molecule RNA fluorescence in situ hybridization3,5,8,9 recordings of selected loci. Here we determine transcriptome-wide burst frequencies and sizes for endogenous mouse and human genes using allele-sensitive single-cell RNA sequencing. We show that core promoter elements affect burst size and uncover synergistic effects between TATA and initiator elements, which were masked at mean expression levels. Notably, we provide transcriptome-wide evidence that enhancers control burst frequencies, and demonstrate that cell-type-specific gene expression is primarily shaped by changes in burst frequencies. Together, our data show that burst frequency is primarily encoded in enhancers and burst size in core promoters, and that allelic single-cell RNA sequencing is a powerful model for investigating transcriptional kinetics.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Levine, M. & Tjian, R. Transcription regulation and animal diversity. Nature 424, 147–151 (2003).
Raj, A. & van Oudenaarden, A. Nature, nurture, or chance: stochastic gene expression and its consequences. Cell 135, 216–226 (2008).
Raj, A., Peskin, C. S., Tranchina, D., Vargas, D. Y. & Tyagi, S. Stochastic mRNA synthesis in mammalian cells. PLoS Biol. 4, e309 (2006).
Chubb, J. R., Trcek, T., Shenoy, S. M. & Singer, R. H. Transcriptional pulsing of a developmental gene. Curr. Biol. 16, 1018–1025 (2006).
Levsky, J. M., Shenoy, S. M., Pezo, R. C. & Singer, R. H. Single-cell gene expression profiling. Science 297, 836–840 (2002).
Suter, D. M. et al. Mammalian genes are transcribed with widely different bursting kinetics. Science 332, 472–474 (2011).
Nicolas, D., Phillips, N. E. & Naef, F. What shapes eukaryotic transcriptional bursting? Mol. Biosyst. 13, 1280–1290 (2017).
Fukaya, T., Lim, B. & Levine, M. Enhancer control of transcriptional bursting. Cell 166, 358–368 (2016).
Bartman, C. R., Hsu, S. C., Hsiung, C. C.-S., Raj, A. & Blobel, G. A. Enhancer regulation of transcriptional bursting parameters revealed by forced chromatin looping. Mol. Cell 62, 237–247 (2016).
Walters, M. C. et al. Enhancers increase the probability but not the level of gene expression. Proc. Natl Acad. Sci. USA 92, 7125–7129 (1995).
Zabidi, M. A. et al. Enhancer-core-promoter specificity separates developmental and housekeeping gene regulation. Nature 518, 556–559 (2015).
Deng, Q., Ramsköld, D., Reinius, B. & Sandberg, R. Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells. Science 343, 193–196 (2014).
Reinius, B. et al. Analysis of allelic expression patterns in clonal somatic cells by single-cell RNA-seq. Nat. Genet. 48, 1430–1435 (2016).
Reinius, B. & Sandberg, R. Random monoallelic expression of autosomal genes: stochastic transcription and allele-level regulation. Nat. Rev. Genet. 16, 653–664 (2015).
Kim, J. K. & Marioni, J. C. Inferring the kinetics of stochastic gene expression from single-cell RNA-sequencing data. Genome Biol. 14, R7 (2013).
Jiang, Y., Zhang, N. R. & Li, M. SCALE: modeling allele-specific gene expression by single-cell RNA sequencing. Genome Biol. 18, 74 (2017).
Peccoud, J. & Ycart, B. Markovian modeling of gene-product synthesis. Theor. Popul. Biol. 48, 222–234 (1995).
Dar, R. D. et al. Transcriptional burst frequency and burst size are equally modulated across the human genome. Proc. Natl Acad. Sci. USA 109, 17454–17459 (2012).
Eisenberg, E. & Levanon, E. Y. Human housekeeping genes are compact. Trends Genet. 19, 362–365 (2003).
Hornung, G. et al. Noise-mean relationship in mutated promoters. Genome Res. 22, 2409–2417 (2012).
Tantale, K. et al. A single-molecule view of transcription reveals convoys of RNA polymerases and multi-scale bursting. Nat. Commun. 7, 12248 (2016).
Malecová, B., Gross, P., Boyer-Guittaut, M., Yavuz, S. & Oelgeschläger, T. The initiator core promoter element antagonizes repression of TATA-directed transcription by negative cofactor NC2. J. Biol. Chem. 282, 24767–24776 (2007).
Li, Y. et al. CRISPR reveals a distal super-enhancer required for Sox2 expression in mouse embryonic stem cells. PLoS One 9, e114485 (2014).
Merkin, J., Russell, C., Chen, P. & Burge, C. B. Evolutionary dynamics of gene and isoform regulation in mammalian tissues. Science 338, 1593–1599 (2012).
Padovan-Merhar, O. et al. Single mammalian cells compensate for differences in cellular volume and DNA copy number through independent global transcriptional mechanisms. Mol. Cell 58, 339–352 (2015).
Hantsche, M. & Cramer, P. Conserved RNA polymerase II initiation complex structure. Curr. Opin. Struct. Biol. 47, 17–22 (2017).
Roeder, R. G. The role of general initiation factors in transcription by RNA polymerase II. Trends Biochem. Sci. 21, 327–335 (1996).
Jonkers, I. & Lis, J. T. Getting up to speed with transcription elongation by RNA polymerase II. Nat. Rev. Mol. Cell Biol. 16, 167–177 (2015).
We thank Q. Deng for ES cell culturing, S. Giatrellis for assistance with FACS sorting, G.-J. Hendriks for discussions, and the remainder of the Sandberg laboratory. This work was supported by grants to R.S. from the European Research Council (648842), the Swedish Research Council (2017-01062), the Knut and Alice Wallenberg’s foundation (2017.0110) and the Bert L. and N. Kuggie Vallee Foundation.
Nature thanks C. Bartman, P. Kharchenko, A. Raj and the anonymous reviewer(s) for their contribution to the peer review of this work.
Extended data figures and tables
a, Illustration of the two-state model of transcription. The promoter can be in an ON or OFF state and converts from OFF to ON with a rate kon, and from ON to OFF with rate koff. In the ON state, RNAs are transcribed with rate ksyn and degraded with rate deg. See Supplementary Methods. b, Derivation of a confidence interval for a simulated set of observations with a given burst frequency = 0.5 (n = 200 simulated observations). The quadratic function shown in blue is a transformed version of the log-likelihood as a function of burst frequency, in which the most likely parameter value has a likelihood of 0. Standard theory for likelihood methods gives a cut off value of 1.92 for a 95% confidence interval (solid red line), which can then be traced down to their corresponding value on the x axis (dashed red lines) to derive a confidence interval. The true value, shown as a green dot, is within the confidence interval. c, Goodness-of-fit test for 7,382 genes on the C57 allele of the fibroblast cells (from molecular level input data). The histogram shows the mean expression levels of genes with a good (green) or bad (red) fit (Supplementary Methods). d, A scatter plot of the Akaike information criterion (AIC) for the inference obtain from molecule (UMIs) and RPKM values. The green line denotes y = x. e, f, Scatter plot of the burst frequency and size obtained from inference procedure based on either molecules (UMIs) or RPKM values. g, Scatter plot of mean expression against inferred burst frequency for all genes in fibroblasts. Red line denotes spline fitted to data. h, Scatter plot of mean expression against inferred burst size for all genes in fibroblasts. Red line denotes spline fitted to data. Data in g and h are from the CAST allele. i, Scatterplot of the percentage of biallelic to silent cells for 10,727 genes, in fibroblasts. The genes are located on the expected curve under the independence model (see Supplementary Methods and ref. 12 for details).
a–j, The distribution of inferred burst frequency and sizes as a function of sensitivity (loss of RNA molecules) and cell numbers, based on the location of the parameters in the kinetics parameter space. Centre lines denote the median; hinges denote the first and third quartiles; whiskers denote 1.5× the interquartile range (IQR). Distributions are based on 50 simulations for each unique combination of parameters and 100 cells for the sensitivity calculations. Inferred burst sizes were divided by the sensitivity used in simulation (as the inferred burst size scales linearly with sensitivity).
Extended Data Fig. 3 Gene length effect on burst size and frequency and the effect of core promoter elements on mean expression and burst frequency.
a, b, Scatter plots of median burst size (a) and frequency (b) compared to median gene length. Genes were binned (50 genes per group). c, Box plot of genes binned according to gene loci length (20 genes per group). For each bin, we ranked genes according to their transcript lengths and calculated the gene-level difference to the median burst size of that bin. We see no effect from differing transcript lengths in estimated burst size. Box plots are as in Extended Data Fig. 2. d, e, Mean expression (d) and burst frequency (e), ordered and coloured for genes based on their core promoter elements. (Complementing the analysis presented in Fig. 2b, but with mean expression and burst frequency as dependent variables.) The results of the linear regressions are shown in Supplementary Table 1 (n = 7,186 genes). f, Scatter plot of burst frequency and size of genes with each dot colour by their mean expression level. g, Box plots showing the inferred burst size for genes separated according to the presence of core promoter elements, and further grouped into five equally sized bins (quintiles, QU1–QU5) according to gene loci lengths. No TATA or initiator: n = 4,397 genes (2,585, 1,124, 635, 36 and 17 in each quintile, respectively), Only initiator: n = 2,035 genes (942, 531, 442, 74 and 46 in each quintile, respectively), Only TATA: n = 359 genes (129, 126, 58, 31 and 15 in each quintile, respectively), TATA and initiator: n = 144 genes (53, 45, 24, 19 and 3 in each quintile, respectively).
a–j, The power of detecting fourfold changes in burst frequency and size as a function of the number of cells depending on the location of transcriptional burst kinetic parameters in parameter space. Top, analysis of power for burst frequency and size in indicated location in parameter space. Bottom, histogram with expression distributions over cells at the different locations in parameter space.
a, b, Box plot visualization of cell-type differences in burst frequency and size, as a function of fold changes in mean expression between cell types, as in Fig. 3c. c, d, Box plots of the fold change in mean expression for the top 100 genes in each direction for burst frequency and size, respectively (n = 100 genes in each group, two-sided t-test). e, f, Box plots of the fold change in normalized read density of H3K27ac in enhancers (enhancer magnitude) between cell types. Enhancer linked to genes that had top 100 changes in either burst frequency or size (n = 100 genes in each group, two-sided t-test). g, Rolling median (n = 50) of SNPs per enhancer ordered by the P value of burst size difference between the CAST and C57 allele in fibroblast cells (profile likelihood test, no adjustment for multiple comparisons).
Extended Data Fig. 6 Representative images for cell identification and RNA transcript quantification using smFISH.
a, b, Two representative cells for the detection of Msl3 in male fibroblast (a) and male ES cell (b) (from 140 fibroblasts and 341 ES cells). From left to right: probe detection (Q570), antibody detection (Cy5), DAPI, and identified RNA transcripts. White rectangles in b denote the region zoomed-in for RNA transcript quantification. Original images are available at https://github.com/sandberg-lab/txburst.
a–d, Histograms of the expression distributions of genes measured by smFISH (left) and scRNA-seq (right) for genes: Hdac6 (a), Msl3 (b), Mpp1 (c) and Igbp1 (d). The number of cells quantified for each gene, cell type and method is presented above each figure item. e, f, Scatter plots of burst size (e) and frequency (f) inferred based on data from scRNA-seq and smFISH. Data from both fibroblasts and ES cells are shown. Although the few data points do not allow for a systematic comparison between methods, we observed a few trends. There was a good agreement for both burst size and frequency except for the gene Igbp1 that is an outlier in both scatterplots. Igbp1 has increased burst size and lower burst frequency in scRNA-seq than in smFISH. Excluding Igbp1, we see a fairly linear correspondence between methods over the remaining six data points (three genes and two cell types). g–j, Point estimates and confidence intervals shown for each gene, cell type and method based on the profile likelihood method. Number of cells used for the inference is shown in the corresponding histogram in a–d. P values for cell-type comparison in burst kinetics is shown per method based on the profile likelihood test.
Extended Data Fig. 8 Expression distributions for genes with significant strain differences in burst kinetics.
a–d, Histograms of the expression distributions for the CAST and C57 alleles in fibroblasts for genes with burst frequency significantly up in CAST (a) and C57 (b), burst size significantly up in CAST (c) and C57 (d). e, Expression distributions for the 129 and CAST alleles in the wild-type mouse ES cells and ES cells harbouring a CAST-lined deletion of a Sox2 enhancer.
a, Scatter plot of burst frequency between one-to-one orthologues of mouse and human (n = 1,609 genes). b, Scatter plot of burst size between one-to-one orthologues of mouse and human (n = 1,609 genes). c, Scatter plot of mean expression between one-to-one orthologues of mouse and human (n = 1,609 genes). d, Left, illustrating the test for conservation beyond mean expression level. In both mouse and human, the orthologue is compared to 50 genes of similar mean expression (7 genes in cartoon) and we determine whether the location on the diagonal is consistent relative to the median gene in both species. Right, the fraction of one-to-one orthologues genes (red) and shuffled orthologues (blue) with consistent positioning in transcriptional kinetics space (binomial test, based on 1,609 genes). Error bars denote standard deviations. The limited numbers of cells and the use of RPKM-based transcriptional burst kinetics inference could underestimate the degree of conservation in transcriptional burst kinetics.
a, c, Comparisons of inferred burst frequency (a) and size (c) for the C57 allele in fibroblasts with cells classified according to cell cycle phase. Scatter plots of burst frequency and size are shown for comparisons between S and G1 (a) and S and G2/M (c) phases. b, d, The Gene Ontology (GO) terms that are enriched in the group of genes with significant differential burst frequency between S and G1 (b) and S and G2/M (d) (n = 116 genes with differential burst frequency in b and 75 genes in d).
This file contains the Supplementary Methods and References
Supplementary Table 1 Transcriptional burst kinetics inferred in Fibroblasts for C57Bl6 and CAST alleles. Transcriptome-wide inference of transcriptional burst kinetics using the maximum likelihood and profile likelihood. Burst kinetics was inferred from allele-resolved single-cell RNA-seq data from 224 primary adult fibroblasts. Inference was made using maximum likelihood and profile likelihood (code available on github: https://github.com/sandberg-lab/txburst). For profile likelihood the point estimates are followed by lower and upper confidence intervals. Mean expression corresponds to the mean number of unique UMIs detected per gene and cell
Supplementary Table 2 Linear regression results from core promoter element effect on burst size, frequency and mean expression. Result from the estimation of the effect of the presence of either of a TATA element or Initiator element (INR) on burst size (sheet 1), burst frequency (sheet 2) and mean expression (sheet 3) as well as their interaction with each other (TATA:INR) and the length of the gene (gl:TATA and gl:INR respectively). Ordinary linear squares (OLS) was used to estimate the parameters. Abbreviations: Dep. Variable: Dependent Variable, gl: gene length (log10 scale), Df: Degrees of freedom, AIC: Akaike information criterion, BIC: Bayesian information criterion, coef: regression coefficient, std err: standard error of the regression coefficient, t: tstatistic, P>|t|: p-value of the t-statistic, 0.025: Lower bound of the 95% confidence interval of the parameters, 0.975: Upper bound of the 95% confidence interval, Cond. No.: Condition Number, bs: burst size (log10 scale), bf: burst frequency (log10 scale), me: mean expression (log10 scale)
Supplementary Table 3 Transcriptional burst kinetics inferred in Embryonic stem cells for C57Bl6 and CAST alleles. Transcriptome-wide inference of transcriptional burst kinetics using the maximum likelihood and profile likelihood. Burst kinetics was inferred from allele-resolved single-cell RNA-seq data from 188 embryonic stem cells. Inference was made using maximum likelihood and profile likelihood (code available on github: https://github.com/sandberg-lab/txburst). For profile likelihood the point estimates are followed by lower and upper confidence intervals. Mean expression corresponds to the mean number of unique UMIs detected per gene and cell
Supplementary Table 4 Significant differences in transcriptional burst kinetics between fibroblasts and embryonic stem cells. Comparison of transcriptional burst kinetics between fibroblasts (n = 224 cells) and embryonic stem cells (n = 188 cells). Burst kinetic parameters per cell type is listed together with p-values from the profile likelihood significance test for a difference in burst frequency or size between cell types
Supplementary Table 5 Significant differences in transcriptional burst kinetics between CAST and C57 alleles in fibroblasts. Comparison of transcriptional burst kinetics between genotypes for the fibroblast cells (n = 224 cells). Burst kinetic parameters per cell type is listed together with p-values from the profile likelihood significance test for a difference in burst frequency or size between cell types
Supplementary Table 6 Transcriptional burst kinetics in mouse and human fibroblasts. Comparison of transcriptional burst kinetics between mouse and human (n = 163 cells) fibroblasts (n = 1609 one-to-one orthologs). The C57 allele is used in mouse and haplotype designated in ‘A’ in human (we have no information regarding whether this is the paternal or maternal allele). Burst kinetic parameters per species are listed with their respective gene name and corresponding mean expression
About this article
Nature Reviews Molecular Cell Biology (2019)
Nature Reviews Genetics (2019)