Single-cell RNA counting at allele and isoform resolution using Smart-seq3

Hagemann-Jensen, Michael; Ziegenhain, Christoph; Chen, Ping; Ramsköld, Daniel; Hendriks, Gert-Jan; Larsson, Anton J. M.; Faridani, Omid R.; Sandberg, Rickard

doi:10.1038/s41587-020-0497-0

Letter
Published: 04 May 2020

Single-cell RNA counting at allele and isoform resolution using Smart-seq3

Nature Biotechnology volume 38, pages 708–714 (2020)Cite this article

59k Accesses
263 Citations
198 Altmetric
Metrics details

Subjects

This article has been updated

Abstract

Large-scale sequencing of RNA from individual cells can reveal patterns of gene, isoform and allelic expression across cell types and states¹. However, current short-read single-cell RNA-sequencing methods have limited ability to count RNAs at allele and isoform resolution, and long-read sequencing techniques lack the depth required for large-scale applications across cells^2,3. Here we introduce Smart-seq3, which combines full-length transcriptome coverage with a 5′ unique molecular identifier RNA counting strategy that enables in silico reconstruction of thousands of RNA molecules per cell. Of the counted and reconstructed molecules, 60% could be directly assigned to allelic origin and 30–50% to specific isoforms, and we identified substantial differences in isoform usage in different mouse strains and human cell types. Smart-seq3 greatly increased sensitivity compared to Smart-seq2, typically detecting thousands more transcripts per cell. We expect that Smart-seq3 will enable large-scale characterization of cell types and states across tissues and organisms.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Overview of single-cell RNA sequencing in Smart-seq3.**

**Fig. 2: Single-cell RNA counting at allele and isoform resolution.**

**Fig. 3: Smart-seq3 analysis of a complex human sample.**

Scalable single-cell RNA sequencing from full transcripts with Smart-seq3xpress

Article Open access 30 May 2022

Fast and highly sensitive full-length single-cell RNA sequencing using FLASH-seq

Article Open access 30 May 2022

Alevin-fry unlocks rapid, accurate and memory-frugal quantification of single-cell RNA-seq data

Article 11 March 2022

Data availability

All sequencing data have been deposited under ArrayExpress E-MTAB-8735 at the European Bioinformatics Institute.

Code availability

Capacity to process Smart-seq3 libraries has been incorporated in zUMIs (https://github.com/sdparekh/zUMIs). Code for molecule reconstruction and allele- and isoform-resolution assignments are available at Github (https://github.com/sandberg-lab/Smart-seq3).

Change history

07 May 2020
In the version of this article initially published online, the accession code E-MTAB-8735 in the Data Availability section linked to an unrelated ArrayExpress record. The error has been corrected in the HTML and PDF versions of the article.

References

Sandberg, R. Entering the era of single-cell transcriptomics in biology and medicine. Nat. Methods 11, 22–24 (2014).
Article CAS Google Scholar
Byrne, A. et al. Nanopore long-read RNAseq reveals widespread transcriptional variation among the surface receptors of individual B cells. Nat. Commun. 8, 16027 (2017).
Article CAS Google Scholar
Gupta, I. et al. Single-cell isoform RNA sequencing characterizes isoforms in thousands of cerebellar cells. Nat. Biotechnol. https://doi.org/10.1038/nbt.4259 (2018).
Article PubMed Google Scholar
Mereu, E. et al. Benchmarking single-cell RNA sequencing protocols for cell atlas projects. Nat. Biotechnol. https://doi.org/10.1038/s41587-020-0469-4 (2020).
Ziegenhain, C. et al. Comparative analysis of single-cell RNA sequencing methods. Mol. Cell 65, 631–643 (2017).
Article CAS Google Scholar
Picelli, S. et al. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat. Methods 10, 1096–1098 (2013).
Article CAS Google Scholar
Deng, Q., Ramsköld, D., Reinius, B. & Sandberg, R. Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells. Science 343, 193–196 (2014).
Article CAS Google Scholar
Bagnoli, J. W. et al. Sensitive and powerful single-cell RNA sequencing using mcSCRB-seq. Nat. Commun. 9, 2937 (2018).
Article Google Scholar
Guo, J. U. & Bartel, D. P. RNA G-quadruplexes are globally unfolded in eukaryotic cells and depleted in bacteria. Science 353, aaf5371 (2016).
Article Google Scholar
Ohtsubo, Y., Nagata, Y. & Tsuda, M. Compounds that enhance the tailing activity of Moloney murine leukemia virus reverse transcriptase. Sci. Rep. 7, 6520 (2017).
Article Google Scholar
Cole, C., Byrne, A., Beaudin, A. E., Forsberg, E. C. & Vollmers, C. Tn5Prime, a Tn5 based 5′ capture method for single cell RNA-seq. Nucleic Acids Res. 46, e62 (2018).
Article Google Scholar
Larsson, A. J. M. et al. Genomic encoding of transcriptional burst kinetics. Nature 565, 251–254 (2019).
Article CAS Google Scholar
Parekh, S., Ziegenhain, C., Vieth, B., Enard, W. & Hellmann, I. zUMIs - a fast and flexible pipeline to process RNA sequencing data with UMIs. Gigascience 7 https://doi.org/10.1093/gigascience/giy059 (2018).
Reinius, B. et al. Analysis of allelic expression patterns in clonal somatic cells by single-cell RNA-seq. Nat. Genet. 48, 1430–1435 (2016).
Article CAS Google Scholar
Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. & Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14, 417–419 (2017).
Article CAS Google Scholar
Martinez, N. M. & Lynch, K. W. Control of alternative splicing in immune responses: many regulators, many predictions, much still to learn. Immunol. Rev. 253, 216–236 (2013).
Article Google Scholar
Wang, E. T. et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008).
Article CAS Google Scholar
Katz, Y., Wang, E. T., Airoldi, E. M. & Burge, C. B. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat. Methods 7, 1009–1015 (2010).
Article CAS Google Scholar
Trapnell, C. et al. Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat. Biotechnol. 31, 46–53 (2013).
Article CAS Google Scholar
Regev, A. et al. The human cell atlas. Elife 6, e27041 (2017).
Article Google Scholar
Scotti, M. M. & Swanson, M. S. RNA mis-splicing in disease. Nat. Rev. Genet. 17, 19–32 (2016).
Article CAS Google Scholar
Picelli, S. et al. Full-length RNA-seq from single cells using Smart-seq2. Nat. Protoc. 9, 171–181 (2014).
Article CAS Google Scholar
Keane, T. M. et al. Mouse genomic variation and its effect on phenotypes and gene regulation. Nature 477, 289–294 (2011).
Article CAS Google Scholar
Lawrence, M. et al. Software for computing and annotating genomic ranges. PLoS Comput. Biol. 9, e1003118 (2013).
Article CAS Google Scholar
Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902(2019).
Article CAS Google Scholar

Download references

Acknowledgements

We would like to thank H. Heyn for providing us with the HCA sample, K. Annusver and P. Johnsson for help with image analysis, S. Parekh for help with the zUMIs pipeline and B. Reinius for discussions. C.Z. is supported by an EMBO long-term fellowship (ALTF 673–2017). G.-J.H. is funded by Human Frontier Science Program long-term fellowship LT000155/2017-L. This work was supported by grants to R.S. from the European Research Council (648842), the Swedish Research Council (2017–01062), the Knut and Alice Wallenberg Foundation (2017.0110), the Bert L. and N. Kuggie Vallee Foundation, the Göran Gustafsson Foundation and the National Institutes of Health. We would also like to acknowledge UPPMAX, National Genomics Infrastructure, Uppsala Genome Center funded by RFI and VR and Science for Life Laboratory, Sweden, National Genomics Infrastructure in Stockholm, funded by Science for Life Laboratory, and the Knut and Alice Wallenberg Foundation.

Author information

Authors and Affiliations

Department of Cell and Molecular Biology, Karolinska Institutet, Stockholm, Sweden
Michael Hagemann-Jensen, Christoph Ziegenhain, Daniel Ramsköld, Gert-Jan Hendriks, Anton J. M. Larsson & Rickard Sandberg
Integrated Cardio Metabolic Center, Karolinska Institutet, Stockholm, Sweden
Ping Chen, Omid R. Faridani & Rickard Sandberg
Lowy Cancer Research Centre, School of Medical Sciences, University of New South Wales, Sydney, Australia
Omid R. Faridani
Garvan Institute of Medical Research, Sydney, Australia
Omid R. Faridani

Authors

Michael Hagemann-Jensen
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Ziegenhain
View author publications
You can also search for this author in PubMed Google Scholar
Ping Chen
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Ramsköld
View author publications
You can also search for this author in PubMed Google Scholar
Gert-Jan Hendriks
View author publications
You can also search for this author in PubMed Google Scholar
Anton J. M. Larsson
View author publications
You can also search for this author in PubMed Google Scholar
Omid R. Faridani
View author publications
You can also search for this author in PubMed Google Scholar
Rickard Sandberg
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.H.-J. developed Smart-seq3 chemistry, generated scRNA-seq libraries, performed computational analysis, prepared figures and wrote the manuscript text. C.Z. provided input to Smart-seq3 chemistry, developed the reconstruction procedure, performed computational analysis, prepared figures and wrote the manuscript text. P.C. developed the reconstruction procedure, performed computational analyses and prepared figures. D.R. performed computational analysis. G.-J.H. and A.J.M.L. developed the reconstruction procedure. O.R.F. developed Smart-seq3 chemistry. R.S. planned and supervised work and wrote the manuscript.

Corresponding author

Correspondence to Rickard Sandberg.

Ethics declarations

Competing interests

R.S., M.H.-J. and O.R.F have filed a patent application on Smart-seq3.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Overview of sequenced conditions and iterations of Smart-seq3.

Each row shows a tested reaction condition and the number of genes detected in individual HEK293FT cells at 0.5 M raw fastq reads. The numbers of individual cells that contained at least one million sequenced reads per condition are listed on the right. Several earlier versions of Smart-seq2 with elements of Smart-seq3 chemistry are included as “Smart-seq-UMI” in this figure. The exact reaction conditions per row are listed in Supplementary Table 1. The asterisks in the right shows the significance level (*0.05, **0.01, ***0.001, ****< 0.001 and ns for non-significance) when comparing the numbers of genes detected in each condition against the gene detection of Smart-seq2 (bottom row) using Wilcoxon rank sum (two-sided). test. Boxplots denote median and first and third quartiles. Whiskers indicate the most extreme data point within 1.5 lengths of the box.

Extended Data Fig. 2 Effects of salts, PEG and additives on Smart-seq3 reverse transcription.

(a) Testing the performance of Maxima H-minus reverse transcription reactions on different reaction conditions. For each condition, we summarized boxplots with the number of unique UMIs detected in individual HEK293FT cells at 1 M raw fastq reads. We tested reverse transcription in either the standard KCl based buffer or using NaCl or CsCl. Moreover, we evaluated the effects of adding of 5% PEG or 1 mM dCTP (n = 16 cells per condition). (b) Reaction conditions as in (a) summarized against the number of genes identified from 1 million raw UMI-reads per cell (n = 16 cells per condition). (c) Reaction conditions as in (a) summarized against the number of genes identified from 1 million raw reads (sub-sampling from both 5’ UMI and internal reads) per cell (n = 16 cells per condition). (d) Genes were classified as having improved detection in Na or Cs salt (either going from undetected to detected, or positive log2 FC > 2 in UMI counts) versus detection in K salt buffer (n = 16 cells per condition). Boxplots show GC content in unchanged genes (n = 9,686), and genes with improved detection in Na (n = 8,477) and Cs salts (n = 6,261). Significance in GC content of gene sets were evaluated using two-sided t-tests, as indicated in figure. Boxplots denote median and first and third quartiles. Whiskers indicate the most extreme data point within 1.5 lengths of the box. (e) Genes with improved detection in Na and Cs are highly consistent. Shown are log2 fold-changes for genes when compared UMI counts in Na and Cs salt buffer compared to K salt buffer.

Extended Data Fig. 3 Tuning 5’ to internal read proportions and template switching oligo PCR priming.

(a) Bioanalyzer traces of libraries shown in Fig. 1c, demonstrating their different length distributions. (b) Sequencing the libraries shown in (a) on an Illumina HiSeq3000 results in higher fractions of 5’ UMI reads than when the same libraries are sequenced on the Illumina NextSeq500 (shown in Fig. 1c) (n = 16 HEK293FT cells per condition). Sequence machine biases are likely fragment length related. (c) Enrichment of 5’ UMI containing reads after tagmentation with a linear PCR step (Forward pre-amplification PCR primer) of either 2 of 4 cycles, before adding index primers and index PCR (n = 16 HEK293FT cells per condition). (d) Increased UMI containing reads with addition of custom i5 Illumina Index oligos targeting the 5’ tag during index PCR (n = 96 Fibroblasts per condition). (e) HEKF293FT cell bioanalyzer traces showing the effect and ability of the template switching oligo priming in PCR in absence and presence of varying amount of forward PCR primer. (f) UMIs detected at 100.000 UMI-reads at varying forward PCR primer concentrations with and without the presence of template switching oligo in PCR reaction (n = 48 HEK293FT cells per condition). (g) Number of genes detected from 100.000 UMI containing reads with increasing amount of forward PCR primer with or without the presence of the template switching oligo in PCR reaction (n = 48 HEK293FT cells per condition). Significance in (f and g) was evaluated by two-sided t-tests, indicated on the figures. Boxplots denote median and first and third quartiles. Whiskers indicate the most extreme data point within 1.5 lengths of the box.

Extended Data Fig. 4 Improved detection of protein-coding and non-coding RNAs with Smart-seq3.

(a) Variants of Smart-seq3 reactions (n = 16 HEK293FT cells per condition) show improved detection of protein coding RNAs and also other classes of RNAs, including poly-A + lincRNAs, antisense RNAs, processed pseudogenes, processed transcripts and snoRNAs, compared to Smart-seq2 (n = 48 HEK293FT cells) and earlier experimental versions of Smart-seq2 with UMIs (here called “intermediate”) (n = 32 HEK293FT cells). (b) Shows genes of similar RNA classes detected by UMI containing reads in Smart-seq2 protocols using UMIs (here called “intermediate”) (n = 32 cells) and Smart-seq3 variants (n = 16 cells per condition).

Extended Data Fig. 5 Visualization of read pairs from a single transcribed molecule and detailed comparison of burst kinetics inference based on Smart-seq2-UMI and Smart-seq3 data.

(a) Visualization of read pairs sequenced from one molecule from the Cox7a2l locus. Top show the exons and introns in the Cox7a2l locus, with genomic coordinates (mm10). Each row shows a unique read pair, where orange boxes show the mapping of sequences onto the genomic loci, dotted lines indicate that the sequences are connected by the read pairs and solid lines indicate that the exon-intron junction was captured in the sequenced reads. Note, all read pairs combined span essentially the full transcript, meaning that for this molecule we could reconstruct the full transcript. (b) Coverage over CAST SNPs in 5’ UMI-containing read pairs in 369 mouse fibroblasts. Shown is the coverage in number of read pairs in a 10 bp sliding window of SNP distance relative to the TSS of their gene. Blue line indicates a loess fit of the data points. (c) Scatter plots showing the burst frequencies inferred for the C57 (x-axis) and CAST (y-axis) alleles for genes in mouse fibroblasts. The left plot shows the results based on Smart-seq3 data and the right panel shows the results from using Smart-seq2-UMI data. (d) Scatter plots show the burst sizes inferred for the C57 (x-axis) and CAST (y-axis) alleles for genes in mouse fibroblasts. The left plot shows the results based on Smart-seq3 data and the right panel shows the results from using Smart-seq2-UMI data.

Extended Data Fig. 6 PacBio sequencing of cDNA from two mouse fibroblasts.

(a) Scatter plot showing the observed RNA molecule length with PacBio (x-axis) against the reconstructed RNA molecule with Smart-seq3 (y-axis). Molecules were matched by the sequencing of the same UMI per gene in the respective sequencing method. (b) Detailed information on the Col1a2 gene, for which Smart-seq3 had reconstructed 1,903 bp. The reconstruction was consistent with the 2,267 bp transcript sequenced by PacBio and both shorter than the 5,348 bp full-length Ensemble annotation. The two gaps in the Smart-seq3 reconstructions are shown as red lines. (c) Scatterplots showing the correspondence between observed molecules from PacBio sequencing (x-axis) and the observed molecules from Illumina sequencing from Smartseq3 cDNA libraries from two mouse fibroblast cells. (d) Pie-chart showing the consistency in isoform assignment between the PacBio and Smartseq3 assigned isoforms.

Extended Data Fig. 7 Engineered DIY spikes for reconstruction validation.

(a) Overview illustrations of the two sets of DIY spikes designed (set 1 and 2, respectively) and each set contains a full-length isoform and four shorter isoforms that harbor connected genetic variation in positions 5 to 35 (marked in red) and 100 bp downstream exclusions. (b) Cumulative percentage of spiked-in molecules that had its most 5’ base within the five first bases on the designed spike-in sequence. (c) Table summarizing correct and incorrect assignment of read pairs towards each diySpike isoform. (d) Histogram showing the read depth over the diySpikes, that is the number of read pairs per observed molecule. (e) Fraction of molecules assigned to a unique isoform or compatible with two or more isoforms as a function of the unique number of read pairs from each molecule.

Extended Data Fig. 8 Species-mixing, doublets and mappings statistics in Smart-seq3.

(a) Scatter plot showing the number of reads that aligned to human (x-axis) and mouse (y-axis) for the complex HCA sample that contained both human, mouse and dog cells. (b) Scatter plot showing the number of reads that aligned to human (x-axis) and dog (y-axis) for the complex HCA sample that contained both human, mouse and dog cells. Few cells show any signal towards more than one genome, demonstrating a very low doublet rate. (c) Percentage of unmapped read pairs, and read pairs that aligned to exonic, intronic and intergenic regions. Separated per protocol (Smart-seq2 and Smart-seq3) and experiment (HEK293FT, Mouse Fibroblasts, HCA cells). (d) Mapping statistics for 5’UMI-containing read pairs in Smart-seq3. Percentage of unmapped read pairs, and read pairs that aligned to exonic, intronic and intergenic regions. Separated per experiment (HEK293FT, Mouse Fibroblasts, HCA cells). The boxplots shown in (c and d) show the median, first and third quartiles as a box, and the whiskers indicate the most extreme data point within 1.5 lengths of the box.

Extended Data Fig. 9 Gene detection of Smart-seq3 and HCA comparison against other scRNA-seq methods.

(a) Number of genes detected in Smart-seq3 and Smart-seq2 as a function of sequence depth. The median detection of genes across cells were represented as dots and the lines indicate the first and third quartiles. Separate plots were generated for cells of different cell types, as indicated on top of each figure item. The respective sample-sizes for Smart-seq3 and Smart-seq2 are: B-cell (n = 366, n = 112 cells), CD4+ T-cell (n = 1,270, n = 356 cells), CD8+ T-cell (n = 665, n = 222 cells), HEK cell (n = 236, n = 62 cells), Monocyte (n = 200, n = 302 cells), NK-cell (n = 352, n = 152 cells). (b) UMAP visualizations of sequenced HCA sample cells on different scRNA-seq protocols (data from Mereu et al. 2019), colored according to the Louvain clustering performed independently on cells from each protocol. The same computational pipeline and parameters was used for these analyses as in Figure 3a, except requiring a depth of just 10,000 reads per cell. Please note, this analysis is not intended to be a thorough benchmarking of methods as the data has merely been scaled and not sub-sampled to account for differences in sequencing depths or cell numbers between protocols. Instead the full data per protocol (Quartz-seq2: n = 1,422 cells, CEL-seq2: n = 750, Smart-seq2: n = 1,160, 10x v2 n = 3,592, 10x v3 n = 6,175 cells) was analyzed and run through a standardized scRNA-seq analysis pipelines, revealing that the B-cells do not easily separate with these other methods.

Extended Data Fig. 10 HCA cluster markers, donor details and additional information on isoform assignments.

(a) Violin plots showing the distribution of expression for genes across all cell type clusters revealed in Figure 3d (total number of cells n = 3,129). Genes were selected to inform on overall types of cells and known sub-type markers. (b) UMAP of HCA sample cells (n = 3,129) with Smart-seq3 (as in Figure 3d) but colored according to donors. (c) Matrix showing the fraction of reconstructed molecules that could be assigned to either one or N number of isoforms, where molecules were first grouped by the number of annotated isoform available for its genes. (d) Boxplots showing the number of molecules per cell (Smart-seq3 n = 3,129 cells, 10x v3 n = 6,175 cells) with unique assignments in Smart-seq3 and 10x Genomics (v3) for genes with more than one annotated isoform in Ensembl. Boxes indicate first and third quartiles with the median marked in white. Whiskers indicate the most extreme data point within 1.5 lengths of the box.

Supplementary information

Supplemental Information

Supplementary Note 1

Reporting Summary

Supplementary Table 1

List of all the reaction conditions shown in Supplementary Fig. 1.

Supplementary Table 2

List of mouse strain-specific isoform expression detected in primary fibroblasts.

Supplementary Table 3

Tables that list genes and isoforms with significant usage differences between human cell types.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hagemann-Jensen, M., Ziegenhain, C., Chen, P. et al. Single-cell RNA counting at allele and isoform resolution using Smart-seq3. Nat Biotechnol 38, 708–714 (2020). https://doi.org/10.1038/s41587-020-0497-0

Download citation

Received: 22 October 2019
Accepted: 24 March 2020
Published: 04 May 2020
Issue Date: June 2020
DOI: https://doi.org/10.1038/s41587-020-0497-0

This article is cited by

A comparison of marker gene selection methods for single-cell RNA sequencing data
- Jeffrey M. Pullin
- Davis J. McCarthy
Genome Biology (2024)
PancrESS – a meta-analysis resource for understanding cell-type specific expression in the human pancreas
- David Sturgill
- Li Wang
- H. Efsun Arda
BMC Genomics (2024)
Split Pool Ligation-based Single-cell Transcriptome sequencing (SPLiT-seq) data processing pipeline comparison
- Lucas Kuijpers
- Bastian Hornung
- Eskeatnaf Mulugeta
BMC Genomics (2024)
scLENS: data-driven signal detection for unbiased scRNA-seq data analysis
- Hyun Kim
- Won Chang
- Jae Kyoung Kim
Nature Communications (2024)
Drug targeting in psychiatric disorders — how to overcome the loss in translation?
- Konstantin Khodosevich
- Katarina Dragicevic
- Oliver Howes
Nature Reviews Drug Discovery (2024)