Large-scale sequencing of RNA from individual cells can reveal patterns of gene, isoform and allelic expression across cell types and states1. However, current short-read single-cell RNA-sequencing methods have limited ability to count RNAs at allele and isoform resolution, and long-read sequencing techniques lack the depth required for large-scale applications across cells2,3. Here we introduce Smart-seq3, which combines full-length transcriptome coverage with a 5′ unique molecular identifier RNA counting strategy that enables in silico reconstruction of thousands of RNA molecules per cell. Of the counted and reconstructed molecules, 60% could be directly assigned to allelic origin and 30–50% to specific isoforms, and we identified substantial differences in isoform usage in different mouse strains and human cell types. Smart-seq3 greatly increased sensitivity compared to Smart-seq2, typically detecting thousands more transcripts per cell. We expect that Smart-seq3 will enable large-scale characterization of cell types and states across tissues and organisms.
Your institute does not have access to this article
Open Access articles citing this article.
Genome Biology Open Access 29 June 2022
Communications Biology Open Access 27 June 2022
Nature Methods Open Access 25 April 2022
Subscribe to Nature+
Get immediate online access to the entire Nature family of 50+ journals
Subscribe to Journal
Get full journal access for 1 year
only $8.25 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
All sequencing data have been deposited under ArrayExpress E-MTAB-8735 at the European Bioinformatics Institute.
Sandberg, R. Entering the era of single-cell transcriptomics in biology and medicine. Nat. Methods 11, 22–24 (2014).
Byrne, A. et al. Nanopore long-read RNAseq reveals widespread transcriptional variation among the surface receptors of individual B cells. Nat. Commun. 8, 16027 (2017).
Gupta, I. et al. Single-cell isoform RNA sequencing characterizes isoforms in thousands of cerebellar cells. Nat. Biotechnol. https://doi.org/10.1038/nbt.4259 (2018).
Mereu, E. et al. Benchmarking single-cell RNA sequencing protocols for cell atlas projects. Nat. Biotechnol. https://doi.org/10.1038/s41587-020-0469-4 (2020).
Ziegenhain, C. et al. Comparative analysis of single-cell RNA sequencing methods. Mol. Cell 65, 631–643 (2017).
Picelli, S. et al. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat. Methods 10, 1096–1098 (2013).
Deng, Q., Ramsköld, D., Reinius, B. & Sandberg, R. Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells. Science 343, 193–196 (2014).
Bagnoli, J. W. et al. Sensitive and powerful single-cell RNA sequencing using mcSCRB-seq. Nat. Commun. 9, 2937 (2018).
Guo, J. U. & Bartel, D. P. RNA G-quadruplexes are globally unfolded in eukaryotic cells and depleted in bacteria. Science 353, aaf5371 (2016).
Ohtsubo, Y., Nagata, Y. & Tsuda, M. Compounds that enhance the tailing activity of Moloney murine leukemia virus reverse transcriptase. Sci. Rep. 7, 6520 (2017).
Cole, C., Byrne, A., Beaudin, A. E., Forsberg, E. C. & Vollmers, C. Tn5Prime, a Tn5 based 5′ capture method for single cell RNA-seq. Nucleic Acids Res. 46, e62 (2018).
Larsson, A. J. M. et al. Genomic encoding of transcriptional burst kinetics. Nature 565, 251–254 (2019).
Parekh, S., Ziegenhain, C., Vieth, B., Enard, W. & Hellmann, I. zUMIs - a fast and flexible pipeline to process RNA sequencing data with UMIs. Gigascience 7 https://doi.org/10.1093/gigascience/giy059 (2018).
Reinius, B. et al. Analysis of allelic expression patterns in clonal somatic cells by single-cell RNA-seq. Nat. Genet. 48, 1430–1435 (2016).
Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. & Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14, 417–419 (2017).
Martinez, N. M. & Lynch, K. W. Control of alternative splicing in immune responses: many regulators, many predictions, much still to learn. Immunol. Rev. 253, 216–236 (2013).
Wang, E. T. et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008).
Katz, Y., Wang, E. T., Airoldi, E. M. & Burge, C. B. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat. Methods 7, 1009–1015 (2010).
Trapnell, C. et al. Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat. Biotechnol. 31, 46–53 (2013).
Regev, A. et al. The human cell atlas. Elife 6, e27041 (2017).
Scotti, M. M. & Swanson, M. S. RNA mis-splicing in disease. Nat. Rev. Genet. 17, 19–32 (2016).
Picelli, S. et al. Full-length RNA-seq from single cells using Smart-seq2. Nat. Protoc. 9, 171–181 (2014).
Keane, T. M. et al. Mouse genomic variation and its effect on phenotypes and gene regulation. Nature 477, 289–294 (2011).
Lawrence, M. et al. Software for computing and annotating genomic ranges. PLoS Comput. Biol. 9, e1003118 (2013).
Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902(2019).
We would like to thank H. Heyn for providing us with the HCA sample, K. Annusver and P. Johnsson for help with image analysis, S. Parekh for help with the zUMIs pipeline and B. Reinius for discussions. C.Z. is supported by an EMBO long-term fellowship (ALTF 673–2017). G.-J.H. is funded by Human Frontier Science Program long-term fellowship LT000155/2017-L. This work was supported by grants to R.S. from the European Research Council (648842), the Swedish Research Council (2017–01062), the Knut and Alice Wallenberg Foundation (2017.0110), the Bert L. and N. Kuggie Vallee Foundation, the Göran Gustafsson Foundation and the National Institutes of Health. We would also like to acknowledge UPPMAX, National Genomics Infrastructure, Uppsala Genome Center funded by RFI and VR and Science for Life Laboratory, Sweden, National Genomics Infrastructure in Stockholm, funded by Science for Life Laboratory, and the Knut and Alice Wallenberg Foundation.
R.S., M.H.-J. and O.R.F have filed a patent application on Smart-seq3.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Each row shows a tested reaction condition and the number of genes detected in individual HEK293FT cells at 0.5 M raw fastq reads. The numbers of individual cells that contained at least one million sequenced reads per condition are listed on the right. Several earlier versions of Smart-seq2 with elements of Smart-seq3 chemistry are included as “Smart-seq-UMI” in this figure. The exact reaction conditions per row are listed in Supplementary Table 1. The asterisks in the right shows the significance level (*0.05, **0.01, ***0.001, ****< 0.001 and ns for non-significance) when comparing the numbers of genes detected in each condition against the gene detection of Smart-seq2 (bottom row) using Wilcoxon rank sum (two-sided). test. Boxplots denote median and first and third quartiles. Whiskers indicate the most extreme data point within 1.5 lengths of the box.
(a) Testing the performance of Maxima H-minus reverse transcription reactions on different reaction conditions. For each condition, we summarized boxplots with the number of unique UMIs detected in individual HEK293FT cells at 1 M raw fastq reads. We tested reverse transcription in either the standard KCl based buffer or using NaCl or CsCl. Moreover, we evaluated the effects of adding of 5% PEG or 1 mM dCTP (n = 16 cells per condition). (b) Reaction conditions as in (a) summarized against the number of genes identified from 1 million raw UMI-reads per cell (n = 16 cells per condition). (c) Reaction conditions as in (a) summarized against the number of genes identified from 1 million raw reads (sub-sampling from both 5’ UMI and internal reads) per cell (n = 16 cells per condition). (d) Genes were classified as having improved detection in Na or Cs salt (either going from undetected to detected, or positive log2 FC > 2 in UMI counts) versus detection in K salt buffer (n = 16 cells per condition). Boxplots show GC content in unchanged genes (n = 9,686), and genes with improved detection in Na (n = 8,477) and Cs salts (n = 6,261). Significance in GC content of gene sets were evaluated using two-sided t-tests, as indicated in figure. Boxplots denote median and first and third quartiles. Whiskers indicate the most extreme data point within 1.5 lengths of the box. (e) Genes with improved detection in Na and Cs are highly consistent. Shown are log2 fold-changes for genes when compared UMI counts in Na and Cs salt buffer compared to K salt buffer.
Extended Data Fig. 3 Tuning 5’ to internal read proportions and template switching oligo PCR priming.
(a) Bioanalyzer traces of libraries shown in Fig. 1c, demonstrating their different length distributions. (b) Sequencing the libraries shown in (a) on an Illumina HiSeq3000 results in higher fractions of 5’ UMI reads than when the same libraries are sequenced on the Illumina NextSeq500 (shown in Fig. 1c) (n = 16 HEK293FT cells per condition). Sequence machine biases are likely fragment length related. (c) Enrichment of 5’ UMI containing reads after tagmentation with a linear PCR step (Forward pre-amplification PCR primer) of either 2 of 4 cycles, before adding index primers and index PCR (n = 16 HEK293FT cells per condition). (d) Increased UMI containing reads with addition of custom i5 Illumina Index oligos targeting the 5’ tag during index PCR (n = 96 Fibroblasts per condition). (e) HEKF293FT cell bioanalyzer traces showing the effect and ability of the template switching oligo priming in PCR in absence and presence of varying amount of forward PCR primer. (f) UMIs detected at 100.000 UMI-reads at varying forward PCR primer concentrations with and without the presence of template switching oligo in PCR reaction (n = 48 HEK293FT cells per condition). (g) Number of genes detected from 100.000 UMI containing reads with increasing amount of forward PCR primer with or without the presence of the template switching oligo in PCR reaction (n = 48 HEK293FT cells per condition). Significance in (f and g) was evaluated by two-sided t-tests, indicated on the figures. Boxplots denote median and first and third quartiles. Whiskers indicate the most extreme data point within 1.5 lengths of the box.
(a) Variants of Smart-seq3 reactions (n = 16 HEK293FT cells per condition) show improved detection of protein coding RNAs and also other classes of RNAs, including poly-A + lincRNAs, antisense RNAs, processed pseudogenes, processed transcripts and snoRNAs, compared to Smart-seq2 (n = 48 HEK293FT cells) and earlier experimental versions of Smart-seq2 with UMIs (here called “intermediate”) (n = 32 HEK293FT cells). (b) Shows genes of similar RNA classes detected by UMI containing reads in Smart-seq2 protocols using UMIs (here called “intermediate”) (n = 32 cells) and Smart-seq3 variants (n = 16 cells per condition).
Extended Data Fig. 5 Visualization of read pairs from a single transcribed molecule and detailed comparison of burst kinetics inference based on Smart-seq2-UMI and Smart-seq3 data.
(a) Visualization of read pairs sequenced from one molecule from the Cox7a2l locus. Top show the exons and introns in the Cox7a2l locus, with genomic coordinates (mm10). Each row shows a unique read pair, where orange boxes show the mapping of sequences onto the genomic loci, dotted lines indicate that the sequences are connected by the read pairs and solid lines indicate that the exon-intron junction was captured in the sequenced reads. Note, all read pairs combined span essentially the full transcript, meaning that for this molecule we could reconstruct the full transcript. (b) Coverage over CAST SNPs in 5’ UMI-containing read pairs in 369 mouse fibroblasts. Shown is the coverage in number of read pairs in a 10 bp sliding window of SNP distance relative to the TSS of their gene. Blue line indicates a loess fit of the data points. (c) Scatter plots showing the burst frequencies inferred for the C57 (x-axis) and CAST (y-axis) alleles for genes in mouse fibroblasts. The left plot shows the results based on Smart-seq3 data and the right panel shows the results from using Smart-seq2-UMI data. (d) Scatter plots show the burst sizes inferred for the C57 (x-axis) and CAST (y-axis) alleles for genes in mouse fibroblasts. The left plot shows the results based on Smart-seq3 data and the right panel shows the results from using Smart-seq2-UMI data.
(a) Scatter plot showing the observed RNA molecule length with PacBio (x-axis) against the reconstructed RNA molecule with Smart-seq3 (y-axis). Molecules were matched by the sequencing of the same UMI per gene in the respective sequencing method. (b) Detailed information on the Col1a2 gene, for which Smart-seq3 had reconstructed 1,903 bp. The reconstruction was consistent with the 2,267 bp transcript sequenced by PacBio and both shorter than the 5,348 bp full-length Ensemble annotation. The two gaps in the Smart-seq3 reconstructions are shown as red lines. (c) Scatterplots showing the correspondence between observed molecules from PacBio sequencing (x-axis) and the observed molecules from Illumina sequencing from Smartseq3 cDNA libraries from two mouse fibroblast cells. (d) Pie-chart showing the consistency in isoform assignment between the PacBio and Smartseq3 assigned isoforms.
(a) Overview illustrations of the two sets of DIY spikes designed (set 1 and 2, respectively) and each set contains a full-length isoform and four shorter isoforms that harbor connected genetic variation in positions 5 to 35 (marked in red) and 100 bp downstream exclusions. (b) Cumulative percentage of spiked-in molecules that had its most 5’ base within the five first bases on the designed spike-in sequence. (c) Table summarizing correct and incorrect assignment of read pairs towards each diySpike isoform. (d) Histogram showing the read depth over the diySpikes, that is the number of read pairs per observed molecule. (e) Fraction of molecules assigned to a unique isoform or compatible with two or more isoforms as a function of the unique number of read pairs from each molecule.
(a) Scatter plot showing the number of reads that aligned to human (x-axis) and mouse (y-axis) for the complex HCA sample that contained both human, mouse and dog cells. (b) Scatter plot showing the number of reads that aligned to human (x-axis) and dog (y-axis) for the complex HCA sample that contained both human, mouse and dog cells. Few cells show any signal towards more than one genome, demonstrating a very low doublet rate. (c) Percentage of unmapped read pairs, and read pairs that aligned to exonic, intronic and intergenic regions. Separated per protocol (Smart-seq2 and Smart-seq3) and experiment (HEK293FT, Mouse Fibroblasts, HCA cells). (d) Mapping statistics for 5’UMI-containing read pairs in Smart-seq3. Percentage of unmapped read pairs, and read pairs that aligned to exonic, intronic and intergenic regions. Separated per experiment (HEK293FT, Mouse Fibroblasts, HCA cells). The boxplots shown in (c and d) show the median, first and third quartiles as a box, and the whiskers indicate the most extreme data point within 1.5 lengths of the box.
Extended Data Fig. 9 Gene detection of Smart-seq3 and HCA comparison against other scRNA-seq methods.
(a) Number of genes detected in Smart-seq3 and Smart-seq2 as a function of sequence depth. The median detection of genes across cells were represented as dots and the lines indicate the first and third quartiles. Separate plots were generated for cells of different cell types, as indicated on top of each figure item. The respective sample-sizes for Smart-seq3 and Smart-seq2 are: B-cell (n = 366, n = 112 cells), CD4+ T-cell (n = 1,270, n = 356 cells), CD8+ T-cell (n = 665, n = 222 cells), HEK cell (n = 236, n = 62 cells), Monocyte (n = 200, n = 302 cells), NK-cell (n = 352, n = 152 cells). (b) UMAP visualizations of sequenced HCA sample cells on different scRNA-seq protocols (data from Mereu et al. 2019), colored according to the Louvain clustering performed independently on cells from each protocol. The same computational pipeline and parameters was used for these analyses as in Figure 3a, except requiring a depth of just 10,000 reads per cell. Please note, this analysis is not intended to be a thorough benchmarking of methods as the data has merely been scaled and not sub-sampled to account for differences in sequencing depths or cell numbers between protocols. Instead the full data per protocol (Quartz-seq2: n = 1,422 cells, CEL-seq2: n = 750, Smart-seq2: n = 1,160, 10x v2 n = 3,592, 10x v3 n = 6,175 cells) was analyzed and run through a standardized scRNA-seq analysis pipelines, revealing that the B-cells do not easily separate with these other methods.
Extended Data Fig. 10 HCA cluster markers, donor details and additional information on isoform assignments.
(a) Violin plots showing the distribution of expression for genes across all cell type clusters revealed in Figure 3d (total number of cells n = 3,129). Genes were selected to inform on overall types of cells and known sub-type markers. (b) UMAP of HCA sample cells (n = 3,129) with Smart-seq3 (as in Figure 3d) but colored according to donors. (c) Matrix showing the fraction of reconstructed molecules that could be assigned to either one or N number of isoforms, where molecules were first grouped by the number of annotated isoform available for its genes. (d) Boxplots showing the number of molecules per cell (Smart-seq3 n = 3,129 cells, 10x v3 n = 6,175 cells) with unique assignments in Smart-seq3 and 10x Genomics (v3) for genes with more than one annotated isoform in Ensembl. Boxes indicate first and third quartiles with the median marked in white. Whiskers indicate the most extreme data point within 1.5 lengths of the box.
Supplementary Note 1
List of all the reaction conditions shown in Supplementary Fig. 1.
List of mouse strain-specific isoform expression detected in primary fibroblasts.
Tables that list genes and isoforms with significant usage differences between human cell types.
About this article
Cite this article
Hagemann-Jensen, M., Ziegenhain, C., Chen, P. et al. Single-cell RNA counting at allele and isoform resolution using Smart-seq3. Nat Biotechnol 38, 708–714 (2020). https://doi.org/10.1038/s41587-020-0497-0
Genome Biology (2022)
MGcount: a total RNA-seq quantification tool to address multi-mapping and multi-overlapping alignments ambiguity in non-coding transcripts
BMC Bioinformatics (2022)
Genome Biology (2022)
Genome Biology (2022)
Communications Biology (2022)