scSLAM-seq reveals core features of transcription dynamics in single cells

Article metrics

Abstract

Single-cell RNA sequencing (scRNA-seq) has highlighted the important role of intercellular heterogeneity in phenotype variability in both health and disease1. However, current scRNA-seq approaches provide only a snapshot of gene expression and convey little information on the true temporal dynamics and stochastic nature of transcription. A further key limitation of scRNA-seq analysis is that the RNA profile of each individual cell can be analysed only once. Here we introduce single-cell, thiol-(SH)-linked alkylation of RNA for metabolic labelling sequencing (scSLAM-seq), which integrates metabolic RNA labelling2, biochemical nucleoside conversion3 and scRNA-seq to record transcriptional activity directly by differentiating between new and old RNA for thousands of genes per single cell. We use scSLAM-seq to study the onset of infection with lytic cytomegalovirus in single mouse fibroblasts. The cell-cycle state and dose of infection deduced from old RNA enable dose–response analysis based on new RNA. scSLAM-seq thereby both visualizes and explains differences in transcriptional activity at the single-cell level. Furthermore, it depicts ‘on–off’ switches and transcriptional burst kinetics in host gene expression with extensive gene-specific differences that correlate with promoter-intrinsic features (TBP–TATA-box interactions and DNA methylation). Thus, gene-specific, and not cell-specific, features explain the heterogeneity in transcriptomes between individual cells and the transcriptional response to perturbations.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: scSLAM-seq resolves transcriptional activity at the single-cell level.
Fig. 2: scSLAM-seq and NTR velocities.
Fig. 3: scSLAM-seq depicts the mode of gene regulation and differentially activated pathways in single cells.
Fig. 4: scSLAM-seq reveals bursting kinetics and core features of heterogeneity in transcription.

Data availability

The sequencing data and gene tables are available from the Gene Expression Omnibus (GEO) with accession number GSE115612. The script files are available at zenodo (doi: 10.5281/zenodo.1299119). GRAND-SLAM is available for non-commercial use at http://software.erhard-lab.de.

References

  1. 1.

    Wagner, A., Regev, A. & Yosef, N. Revealing the vectors of cellular identity with single-cell genomics. Nat. Biotechnol. 34, 1145–1160 (2016).

  2. 2.

    Dölken, L. et al. High-resolution gene expression profiling for simultaneous kinetic parameter analysis of RNA synthesis and decay. RNA 14, 1959–1972 (2008).

  3. 3.

    Herzog, V. A. et al. Thiol-linked alkylation of RNA to assess expression dynamics. Nat. Methods 14, 1198–1204 (2017).

  4. 4.

    Jürges, C., Dölken, L. & Erhard, F. Dissecting newly transcribed and old RNA using GRAND-SLAM. Bioinformatics 34, i218–i226 (2018).

  5. 5.

    La Manno, G. et al. RNA velocity of single cells. Nature 560, 494–498 (2018).

  6. 6.

    Terhune, S. S., Schröer, J. & Shenk, T. RNAs are packaged into human cytomegalovirus virions in proportion to their intracellular concentration. J. Virol. 78, 10390–10398 (2004).

  7. 7.

    Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).

  8. 8.

    Wu, Z., Zhang, Y., Stitzel, M. L. & Wu, H. Two-phase differential expression analysis for single cell RNA-seq. Bioinformatics 34, 3340–3348 (2018).

  9. 9.

    Shalek, A. K. et al. Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells. Nature 498, 236–240 (2013).

  10. 10.

    Marcinowski, L. et al. Real-time transcriptional profiling of cellular and viral gene expression during lytic cytomegalovirus infection. PLoS Pathog. 8, e1002908 (2012).

  11. 11.

    Krause, E., de Graaf, M., Fliss, P. M., Dölken, L. & Brune, W. Murine cytomegalovirus virion-associated protein M45 mediates rapid NF-κB activation after infection. J. Virol. 88, 9963–9975 (2014).

  12. 12.

    Fan, J. et al. Characterizing transcriptional heterogeneity through pathway and gene set overdispersion analysis. Nat. Methods 13, 241–244 (2016).

  13. 13.

    Pachkov, M., Balwierz, P. J., Arnold, P., Ozonov, E. & van Nimwegen, E. SwissRegulon, a database of genome-wide annotations of regulatory sites: recent updates. Nucleic Acids Res. 41, D214–D220 (2013).

  14. 14.

    Lio, C.-W. J. et al. cGAS-STING signaling regulates initial innate control of cytomegalovirus infection. J. Virol. 90, 7789–7797 (2016).

  15. 15.

    Rand, U. et al. Multi-layered stochasticity and paracrine signal propagation shape the type-I interferon response. Mol. Syst. Biol. 8, 584 (2012).

  16. 16.

    Hinata, K., Gervin, A. M., Jennifer Zhang, Y. & Khavari, P. A. Divergent gene regulation and growth effects by NF-κ B in epithelial and mesenchymal cells of human skin. Oncogene 22, 1955–1964 (2003).

  17. 17.

    Hodges, C., Bintu, L., Lubkowska, L., Kashlev, M. & Bustamante, C. Nucleosomal fluctuations govern the transcription dynamics of RNA polymerase II. Science 325, 626–628 (2009).

  18. 18.

    Tantale, K. et al. A single-molecule view of transcription reveals convoys of RNA polymerases and multi-scale bursting. Nat. Commun. 7, 12248 (2016).

  19. 19.

    Picelli, S. et al. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat. Methods 10, 1096–1098 (2013).

  20. 20.

    Zoller, B., Nicolas, D., Molina, N. & Naef, F. Structure of silent transcription intervals and noise characteristics of mammalian genes. Mol. Syst. Biol. 11, 823 (2015).

  21. 21.

    Koch, A. et al. Analysis of DNA methylation in cancer: location revisited. Nat. Rev. Clin. Oncol. 15, 459–466 (2018).

  22. 22.

    Maza, I. et al. Transient acquisition of pluripotency during somatic cell transdifferentiation with iPSC reprogramming factors. Nat. Biotechnol. 33, 769–774 (2015).

  23. 23.

    Reinius, B. & Sandberg, R. Random monoallelic expression of autosomal genes: stochastic transcription and allele-level regulation. Nat. Rev. Genet. 16, 653–664 (2015).

  24. 24.

    Kiefer, L., Schofield, J. A. & Simon, M. D. Expanding the nucleoside recoding toolkit: revealing RNA population dynamics with 6-thioguanosine. J. Am. Chem. Soc. 140, 14567–14570 (2018).

  25. 25.

    Brennecke, P. et al. Accounting for technical noise in single-cell RNA-seq experiments. Nat. Methods 10, 1093–1095 (2013).

  26. 26.

    Schwanhäusser, B. et al. Global quantification of mammalian gene expression control. Nature 473, 337–342 (2011).

  27. 27.

    Liu, Z. et al. Reconstructing cell cycle pseudo time-series via single-cell transcriptome data. Nat. Comm. 8, 22 (2011).

Download references

Acknowledgements

We thank J. Vogel and S. Gorski for comments on the manuscript and T. Walzthoeni (Bioinformatics Core Facility, Institute of Computational Biology,Helmholtz Zentrum München) for bioinformatics support. The work was funded by the European Research Council (ERC-2016-CoG 721016-HERPES and ERC-2018-PoC-DL3 832409 T-GRAND-SLAM) and Infect-ERA grant eDEVILLI (031L0005B) to L.D. The Helmholtz Institute for RNA-based Infection Research (HIRI) supported this work with a seed grant through funds from the Bavarian Ministry of Economic Affairs and Media, Energy and Technology (grant allocation no. 0703/68674/5/2017 and 0703/89374/3/2017). F.J.T. and M.L. acknowledge financial support by the Graduate School QBM (GSC 1006); F.J.T. was supported by BMBF grants 01IS18036A and 01IS18053A, the German Research Foundation (DFG) within the Collaborative Research Centre 1243, Subproject A17, the Helmholtz Association (Incubator grant sparse2big, ZT-I-0007) and the Chan Zuckerberg Initiative DAF (advised fund of Silicon Valley Community Foundation, 182835). M.L. and P.A. acknowledge financial support from the Joachim Herz Stiftung and IZKF, respectively.

Author information

Conceptualization: F.E., A.-E.S. and L.D.; computational methodology: M.L., C.S.J. and F.E.; investigation: F.E., M.A.P.B., T.K., T.H., M.L., P.A., F.J.T. and A.-E.S.; infection experiments: M.A.P.B. and T.H.; establishment of scSLAM-seq: M.A.P.B., T.K., P.A., A.-E.S. and L.D; writing: F.E., A.-E.S. and L.D.; funding acquisition: A.-E.S., F.E. and L.D.; supervision: F.J.T., A.-E.S., F.E. and L.D.

Correspondence to Florian Erhard or Antoine-Emmanuel Saliba or Lars Dölken.

Ethics declarations

Competing interests

A patent (EP 18 17 9371) has been filed on the GRAND-SLAM approach to analyse the relative contribution of transcriptional activity based on U-to-C conversions. The authors declare no other competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 scSLAM-seq quality controls.

a, Total number of genes detected after scSLAM-seq across all four experimental conditions (uninfected and CMV-infected cells; two biological replicates) versus the total read counts per single cell. The horizontal line indicates a threshold below which cells were excluded from the analysis. b, Partition of reads devoted for host (cellular), viral, spike-in control (External RNA Controls Consortium (ERCC)) and mitochondrial genes (Mitoch) across all individual cells. c, Rates of nucleotide substitutions demonstrate efficient conversion rates in 4sU-treated single cells (4sU) compared with 4sU-naive cells (no4sU). This was true for reads originating from both cDNA strands (sense and antisense) as well as overlapping parts of the paired-end sequencing (overlap). d, As in c, zoomed into the range (y axis) 0 to 0.004. e, Number of genes per cell for which the NTR could be quantified with high precision (90% credible interval (CI) < 0.2) compared with the detected genes and reliably detected genes (TPM >10). f, Correlation between expression levels of bulk RNA-seq with the pooled scRNA-seq data for total, new and old RNA. Genes are coloured according to RNA half-life. Pearson’s correlation coefficient (R) and the number of genes used (n) are indicated. g, Identification of highly variable genes (magenta) using ERCC spike-ins to model the technical noise applied to total RNA (1% false discovery rate). Squared coefficients of variation (CV2) are plotted against the average normalized read counts for all cells that pass the quality-control filters. The solid pink line fits the average values for ERCC spike‐ins (blue dots)25. The dashed line marks the expected position of genes with 50% biological coefficient of variation. h, PCA (the two first components are depicted) of highly variably genes including the viral transcripts (infected: n = 2 replicates, 49 cells; uninfected: n = 2 replicates, 45 cells). i, Correlation of the percentage of viral reads with the distance to uninfected cells in the first two principal components as measured by logistic regression for the PCA in f (n = 2 replicates, 49 cells). j, As in i, for the PCA in Fig. 2a (n = 2 replicates, 49 cells).

Extended Data Fig. 2 Half-life estimates and PCA on regulated genes.

a, Correlation between RNA half-lives estimated from bulk SLAM-seq (n = 2 replicates). b, The fold change (FC), log2(MCMV/Mock), of total RNA from bulk SLAM-seq is scattered against the log2-transformed fold change of new RNA from bulk sequencing, stratified for different RNA half-lives (average of n = 2 replicates). c, PCA on genes that are differentially expressed in new RNA from the bulk experiments (absolute log2-transformed fold change > 0.5). Top, PCA on genes with short RNA half-lives (less than 2 h) are shown. Bottom, PCA on genes with long RNA half-lives (more than 4 h). Left to right, PCA was performed using total, old or new RNA, or the NTR (infected, n = 2 replicates, 49 cells; uninfected, n = 2 replicates, 45 cells). ARI, adjusted rand index. d, Correlation analysis of the PCA from c with the percentage of viral reads. Pearson’s correlation coefficients and P values determined by a t-test on Pearson’s correlation coefficient are indicated (see Extended Data Fig. 1j) (n = 2 replicates, 49 cells).

Extended Data Fig. 3 RNA and NTR velocities.

a, PCA computed on velocity values, on expression values projected 1 h into the future using velocity and the intron/exon count ratio for the scSLAM-seq data using the same set of genes as in Fig. 2a (uninfected, n = 2 replicates, 43 cells; infected, n = 2 replicates, 44 cells). b, PCA for the 10x data on the same set of genes as used for Fig. 2a on the basis of mature transcripts (exonic reads only; left) marginally separated uninfected from infected cells. By contrast, on the basis of the ratio of intronic/exonic reads (right), infected cells were almost perfectly separated from uninfected (right; uninfected, n = 2 replicates, 793 cells; infected, n = 2 replicates, 353 cells). c, As in Fig. 2d, except velocities were computed on the basis of degradation rates estimated from uninfected cells only; that is, not violating the steady-state assumption but supplying the class labels (violating the blind test of prediction). d, Scatterplot comparing NTRs with velocities from the 10x scRNA-seq data for down- and upregulated genes.

Extended Data Fig. 4 scSLAM-seq differentiates incoming virion-associated RNA from de novo transcribed viral RNA.

Heat maps showing the levels of old (left) and new (right) RNA relative to the maximal total level for each viral gene (rows) per CMV-infected cell (columns). Cells are sorted according to the percentage of viral reads among all reads from the cell. Kinetic classes are indicated. E, early; IE, immediate early; L, late; ND, not defined. The ratio of new to old RNA (log2(new/old)) and total expression from pooled cells for each viral gene are depicted.

Extended Data Fig. 5 Correlation of viral gene expression with dose of infection and cell cycle.

a, Comparison of virus stock-derived RNA (virion-associated RNA) and old RNA levels of CMV genes. Mean levels obtained from four independent virus stock vials and mean expression levels in the CMV-infected cells (n = 2 replicates, 49 cells) are compared. The colours indicate viral genes of different kinetic classes. TL, true late. Pearson’s correlation coefficient and P values determined by a t-test on Pearson’s correlation coefficient are indicated. b, As in a but normalized for the total expression levels of the respective genes in the CMV-infected cells. Pearson’s correlation coefficient and P values are indicated. c, Scatter plot comparing the predicted extent of viral gene expression per individual cell (n = 2 replicates, 49 cells) on the basis of the dose of infection with the observed expression. P value determined by likelihood ratio test. d, Distribution of viral reads for cells in G1 (n = 9 cells), S (n = 20 cells) or G2/M (n = 20 cells) phases at the beginning of infection. P value determined by two-sided Wilcoxon test. e, Extent of cell-cycle disruption on the basis of cell-cycle projections derived from old and total RNA of uninfected (mock, n = 2 replicates, 45 cells) and CMV-infected (n = 2 replicates, 49 cells) cells for G1, S and G2/M phases. Individual cells are shown as dots. P values determined by two-sided Wilcoxon tests. f, Unbiased pathway and gene set overdispersion analysis (PAGODA)12 revealed Gene Ontology (GO) terms associated with mock- and CMV-infected cells. The fraction (total, new or old) in which each signature was found is indicated. g, h, The NF-κB (g) and IFN (h) response signature score for each cell (n = 2 replicates, 49 cells) is plotted against its viral RNA content. The linear regression fit (line), 95% credible interval (shading), Spearman’s ρ values and P values determined by t-test on Spearman’s ρ are indicated. i, j, Distribution of the extent of the NF-κB (i) and IFN (g) responses for cells in G1 (n = 9 cells), S (n = 20 cells) or G2/M (n = 20 cells) phase at the beginning of infection. P values determined by two-sided Wilcoxon tests. All box plots are as in Fig. 4c. Dots represent outliers.

Extended Data Fig. 6 B-scores, nUMIs and transcriptional bursts.

a, Correlation of B-scores (n = 2 replicates). Pearson’s correlation coefficient and P values determined by t-test on Pearson’s correlation coefficient are indicated. b, The number of old and new molecules (estimated by regression analysis with RNA spike-ins) is shown for Hif1a and Atg12. Both show extreme NTR variance, which results from very few sampled mRNA molecules. Dots represent maximum a posteriori estimates of uninfected (mock, n = 2 replicates, 45 cells) and CMV-infected (n = 2 replicates, 49 cells) cells. Error bars denote 90% credible intervals provided by GRAND-SLAM. c, The average mRNA copy numbers obtained from ref. 26 are scattered against the average copy numbers estimated by regression analysis with RNA spike-ins (P < 2 × 10−16, two-sided t-test on Pearson’s correlation coefficient; see Supplementary Methods). d, The fraction of genes with nUMIs > 0 for all cells with detectable reads is shown for the six samples that were not labelled with 4sU. e, The distribution of gene-wise detection rates is shown for all genes that were detected at least once in the 10x and scSLAM-seq data (n = 12,784). A gene is called detected with at least one UMI or read in the 10x or scSLAM-seq data, respectively. f, The distributions of nUMI (left) or UMI (right) counts per cell are shown for uninfected and infected cells in the scSLAM-seq or 10x experiments, respectively. g, The average copy number obtained from ref. 26 is plotted against the average number of UMIs and nUMIs from the 10x uninfected (n = 2 replicates, 353 cells) and scSLAM-seq (n = 2 replicates, 45 cells) experiments, respectively. Lines represent the median capture rates. h, As in f, but using enUMIs.

Extended Data Fig. 7 Genome browser screenshots visualizing nUMIs for exemplary genes.

a, Example genome viewer screenshot of two single cells for the Atg12 gene showing individual reads. Grey and black denote the singly and doubly sequenced parts. The genomic sequence is colour-coded (A, red; C, green; G, blue; T, orange). Mismatches are indicated on the reads with the same colour code. On doubly sequenced parts, the top and bottom triangle represents the corresponding mismatches of the first and second read, respectively. In cell 84, there are two characteristic 4sU mismatches, which are observed in all reads. In cell 29, there are six mismatches, which are distinct from those observed in cell 84. In both cells, only a single new transcript with stochastic 4sU incorporation thus gave rise to the respective reads. b, Example genome browser screenshot for the Sqle gene. Here, at least four (nUMI = 4) mRNAs gave rise to the observed reads. However, as not all reads overlap, this is likely to be an underestimation of the actual number of cloned transcripts.

Extended Data Fig. 8 B-scores reflect stochastic transcriptional activity in single cells.

a, Comparison of B-scores (from n = 2 replicates, 45 cells) with bulk RNA expression levels stratified by RNA half-life. The r2 values of ordinary linear regression are shown (lines indicated). Especially for genes with short-lived transcripts, there was no correlation indicating that high B-scores are not due to inefficient RNA capture. b, Pearson’s correlation coefficient of NTR values for the top 10% most-variable genes for pairs of cells (n = 2 replicates, 45 cells) either in the same cell-cycle phase (purple), or in different stages of the cell cycle (green). c, Cells (n = 2 replicates, 45 cells) were ordered according to the cell cycle using reCAT (recover cycle along time)27 (x axis). The correlation of each cell with the next cell in the order, or the cell farthest away in the order (opposite), is shown. Pearson’s correlation coefficients were computed on the NTR values of the top 10% most-variable genes. d, Heat map showing the NTR values for marker genes of the S and G2/M phases of the cell cycle. Grey fields indicate undetected genes. Cells (columns, n = 2 replicates, 45 cells) were ordered according to the log odds (on–off versus S–G2/M). The log odds and associated P values (two-sided Fisher’s exact test, corrected by Benjamini–Hochberg) are indicated. Genes (rows) were ordered according to the correlation of their NTR values with the log odds order. e, Distribution of the Spearman correlation coefficient of the top 10% most-variable genes (B-score against cell-cycle log-odds order; see d). For the sake of comparison, cells were permuted randomly 100 times, and the corresponding distributions of the correlation coefficient are indicated. f, Scatter plot of RNA half-lives and the fraction of on cells (n = 2 replicates, 45 cells) among all cells with the gene in either on or off state. All box plots are as in Fig. 4c. Dots denote outliers.

Extended Data Fig. 9 Sequence logos overrepresented in promoter regions of dichotomous genes.

Sequence logos of the transcription factors with significantly enriched binding sites among genes with low (Tbp) or high (Patz1, Pml, Chd1, Sin3a and Zbtb14) B-scores obtained from the SwissRegulon database13.

Extended Data Fig. 10 Bursting kinetics analyses repeated for subsets of genes.

a, Analyses as in Fig. 4a–e repeated only for the 1,718 significantly regulated genes according to the heterogeneity test (on the basis of nUMIs and enUMIs) (see Supplementary Methods). b, Analyses as in Fig. 4a–e repeated for genes with the top 50% (n = 2,770) of expressed genes. All box plots are as in Fig. 4c. P values were determined by two-sided Wilcoxon test.

Supplementary information

Supplementary Information

This file contains Supplementary Methods, Supplementary References and a Supplementary Table Guide. The Supplementary Methods include detailed information on virus stock production and titration, sequencing of virus stock-derived RNA, virus infection and metabolic RNA labeling, library preparation and sequencing including droplet sequencing using 10x Genomics as well as data processing and analysis.

Reporting Summary

Supplementary Tables

This file contains Supplementary Tables 1-8 – see Supplementary Information document for full table descriptions.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Erhard, F., Baptista, M.A.P., Krammer, T. et al. scSLAM-seq reveals core features of transcription dynamics in single cells. Nature 571, 419–423 (2019) doi:10.1038/s41586-019-1369-y

Download citation

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.