Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells

Journal name:
Nature Biotechnology
Year published:
Published online


Genome-wide transcriptome analyses are routinely used to monitor tissue-, disease- and cell type–specific gene expression, but it has been technically challenging to generate expression profiles from single cells. Here we describe a robust mRNA-Seq protocol (Smart-Seq) that is applicable down to single cell levels. Compared with existing methods, Smart-Seq has improved read coverage across transcripts, which enhances detailed analyses of alternative transcript isoforms and identification of single-nucleotide polymorphisms. We determined the sensitivity and quantitative accuracy of Smart-Seq for single-cell transcriptomics by evaluating it on total RNA dilution series. We found that although gene expression estimates from single cells have increased noise, hundreds of differentially expressed genes could be identified using few cells per cell type. Applying Smart-Seq to circulating tumor cells from melanomas, we identified distinct gene expression patterns, including candidate biomarkers for melanoma circulating tumor cells. Our protocol will be useful for addressing fundamental biological problems requiring genome-wide transcriptome profiling in rare cells.

At a glance


  1. Smart-Seq read coverage across transcripts.
    Figure 1: Smart-Seq read coverage across transcripts.

    (a) Comparison of read coverage over transcripts for Smart-Seq–analyzed mouse oocytes (n = 3) and previously published mouse oocyte transcriptome data (ref. 7; n = 2). Transcripts were grouped according to annotated lengths and analyzed separately, with the transcript length ranges indicated (top right). We display the read coverage over the transcripts as a distance from the 3′ end, with the vertical dashed gray line showing the length of the shortest included transcripts after which a decline in read coverage is expected. Error bars represent s.d. among biological replicates. (b) Mean read coverage over transcripts for Smart-Seq data generated from diluted amounts of mouse brain RNA. Independent dilution series (including data from different laboratories) are shown as separate data sets, and sample numbers are listed from uppermost line down. For comparison, we included data from standard mRNA-Seq on 100 ng of mouse brain RNA (non-amplified). Errors bars, s.d. (n = 5, 3, 4 and 4 for lines top to bottom). (c) Read coverage (as in b) for 12 individual human cells of prostate and bladder cancer line origin, analyzed using Smart-Seq (cancer cells; n = 12) and for prostate cell line LNCaP analyzed with standard mRNA-Seq (non-amplified; n = 4). Error bars, s.d.

  2. Sensitivity and variability in Smart-Seq from few or single cells.
    Figure 2: Sensitivity and variability in Smart-Seq from few or single cells.

    (a) Percentage of genes reproducibly detected in replicate pairs, binned according to expression level. We performed all pair-wise comparisons within groups of replicates and report the mean and 90% confidence interval. We used Smart-Seq data generated from diluted amounts of human UHRR total RNA as indicated. As controls, we added both a comparison of technical replicates of human UHRR analyzed using standard mRNA-Seq protocols with 100 ng input RNA (non-amplified) as well as a comparison of human UHRR and brain RNA from standard mRNA-Seq data. (b) Percentage of genes reproducibly detected within replicate pairs, binned according to expression level (as in a) for human LNCaP, PC3 and T24 cells. We show pair-wise comparisons among single cells from the same cancer cell line (blue), among multiple cells of the same cell line (purple and blue), and comparisons among single cells from different cancer cell lines (yellow). (c) Standard deviation in gene expression estimates within replicates in bins of genes sorted according to expression levels. Error bars, s.e.m. (n ≥ 10) (d) Standard deviation in gene expression estimates in replicates (as in c). (eg) Scatter plots showing the relative differences between human UHRR and brain gene expression levels estimated from standard mRNA-Seq data on 100 ng input RNA (x axis) and Smart-Seq generated data (y axis) starting from 1 ng total RNA (e), 100 pg total RNA (f) and 10 pg total RNA (g). Correlation coefficients computed from log2 transformed relative gene expression profiles, together with nonlinear loess regression curves (green) and y = x lines (red).

  3. Transcriptional and post-transcriptional analyses of cancer cell line cells using Smart-Seq.
    Figure 3: Transcriptional and post-transcriptional analyses of cancer cell line cells using Smart-Seq.

    (a) Categorization of individual cells according to cell line of origin using single-cell Smart-Seq transcriptomes. Singular-value decomposition (SVD) analysis was conducted for 12 individual cancer cells (four cells each from the PC3, LNCaP and T24 cancer cell lines) based on global gene expression profiles. Projections are shown based on the first two dimensions that capture most of the variance. The numbers of significantly differentially expressed genes per pair-wise cell line comparison are shown next to the arrows (P < 0.05, 1-way ANOVA and Tukey post-hoc test). (b) Mean number of exons with sufficient read coverage for MISO analyses of exon inclusion levels in sequence-depth matched single-cell mRNA-Seq data. Smart-Seq data from diluted mouse brain RNA (green) compared with previously published mouse ESCs8 and ESC-derived cells (red) and 12 Smart-Seq–analyzed individual prostate and bladder cell line cells (purple). Individual RNA or cell measurements are plotted. (c) Single-cell Smart-Seq reads mapping to a portion of the NEDD4L gene locus from four individual T24 and LNCaP cells. Read coverage is shown as a heatmap with darker blue indicating higher read coverage. (d) Number of differentially included exons identified among the PC3, LNCaP and T24 cell lines from single-cell Smart-Seq analysis on four cells per cell line as a function of estimated false discovery rate.

  4. Single-cell transcriptomes of circulating tumor cells.
    Figure 4: Single-cell transcriptomes of circulating tumor cells.

    (a) Hierarchical clustering of human samples based on gene expression of highly expressed genes (>100 RPKM). Coloring indicates high-order clusters and the confidence in clusters are indicated with bootstrap values (percentage). Samples analyzed include human immune samples (Burkitt's lymphoma cell lines BL41 and BJAB, and white blood cells and lymph node samples) and cells from putative melanoma CTCs (CTC), primary melanocytes (PM), melanoma cell lines SKMEL5 (SKMEL) and UACC257 (UACC), prostate cancer cell lines (LNCaP and PC3), bladder cancer cell line (T24) and human embryonic stem cells (ESC). (b) Expression of melanocyte makers (PMEL, MITF, TYR and MLANA) and immune marker PTPRC in single-cell transcriptomes from a with Burkitt's lymphoma cell lines BL41 and BJAB (BL). (c) Gene expression levels in CTCs for an unbiased set of 100 immune and melanoma markers. (df) Heatmaps showing relative expression of melanoma associated tumor antigens (d), upregulated plasma-membrane proteins (e), and downregulated plasma-membrane proteins (f) in single-cell transcriptomes as in b with the addition of more immune samples (W, white blood cells; L, lymph node). (g) Number of reads from individual PMs and putative CTCs that support the reference (G) or risk (A) allele for the melanoma-associated SNP (rs1126809).

Accession codes

Primary accessions

Gene Expression Omnibus


  1. Mortazavi, A., Williams, B., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621628 (2008).
  2. Guttman, M. et al. Ab initio reconstruction of cell type–specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat. Biotechnol. 28, 503510 (2010).
  3. Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511515 (2010).
  4. Wang, E.T. et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470476 (2008).
  5. Pan, Q., Shai, O., Lee, L.J., Frey, B.J. & Blencowe, B.J. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat. Genet. 40, 14131415 (2008).
  6. Kurimoto, K. et al. An improved single-cell cDNA amplification method for efficient high-density oligonucleotide microarray analysis. Nucleic Acids Res. 34, e42 (2006).
  7. Tang, F. et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nat. Methods 6, 377382 (2009).
  8. Tang, F. et al. Tracing the derivation of embryonic stem cells from the inner cell mass by single-cell RNA-Seq analysis. Cell Stem Cell 6, 468478 (2010).
  9. Islam, S. et al. Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Res. 21, 11601167 (2011).
  10. Iscove, N.N. et al. Representation is faithfully preserved in global cDNA amplified exponentially from sub-picogram quantities of mRNA. Nat. Biotechnol. 20, 940943 (2002).
  11. Katz, Y., Wang, E.T., Airoldi, E.M. & Burge, C.B. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat. Methods 7, 10091015 (2010).
  12. Talasaz, A.H. et al. Isolating highly enriched populations of circulating epithelial cells and other rare cells from blood using a magnetic sweeper device. Proc. Natl. Acad. Sci. USA 106, 39703975 (2009).
  13. Shukla, S. et al. CTCF-promoted RNA polymerase II pausing links DNA methylation to splicing. Nature 3, 7479 (2011).
  14. Jungbluth, A.A. et al. Expression of melanocyte-associated markers gp-100 and Melan-A/MART-1 in angiomyolipomas. An immunohistochemical and rt-PCR analysis. Virchows Arch. 434, 429435 (1999).
  15. Tomita, Y., Montague, P.M. & Hearing, V.J. Anti-T4-tyrosinase monoclonal antibodies–specific markers for pigmented melanocytes. J. Invest. Dermatol. 85, 426430 (1985).
  16. Fang, D. & Setaluri, V. Role of microphthalmia transcription factor in regulation of melanocyte differentiation marker TRP-1. Biochem. Biophys. Res. Commun. 256, 657663 (1999).
  17. Chomez, P. et al. An overview of the MAGE gene family with the identification of all human members of the family. Cancer Res. 61, 55445551 (2001).
  18. Tang, A. et al. E-cadherin is the major mediator of human melanocyte adhesion to keratinocytes in vitro. J. Cell Sci. 107, 983992 (1994).
  19. Duncan, L.M. et al. Down-regulation of the novel gene melastatin correlates with potential for melanoma metastasis. Cancer Res. 58, 15151520 (1998).
  20. Gudbjartsson, D.F. et al. ASIP and TYR pigmentation variants associate with cutaneous melanoma and basal cell carcinoma. Nat. Genet. 40, 886891 (2008).
  21. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
  22. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 20782079 (2009).
  23. Ramsköld, D., Wang, E.T., Burge, C.B. & Sandberg, R. An abundance of ubiquitously expressed genes revealed by tissue transcriptome sequence data. PLOS Comput. Biol. 5, e1000598 (2009).
  24. Bengtsson, M., Ståhlberg, A., Rorsman, P. & Kubista, M. Gene expression profiling in single cells from the pancreatic islets of Langerhans reveals lognormal distribution of mRNA levels. Genome Res. 15, 13881392 (2005).
  25. Au, K.F., Jiang, H., Lin, L., Xing, Y. & Wong, W.H. Detection of splice junctions from paired-end RNA-seq data by SpliceMap. Nucleic Acids Res. 38, 45704578 (2010).
  26. Bullard, J.H., Purdom, E., Hansen, K.D. & Dudoit, S. Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics 11, 94 (2010).
  27. Sam, L.T. et al. A comparison of single molecule and amplification based sequencing of cancer transcriptomes. PLoS ONE 6, e17305 (2011).
  28. Wall, M.E., Dyck, P.A. & Brettin, T.S. SVDMAN–singular value decomposition analysis of microarray data. Bioinformatics 17, 566568 (2001).
  29. Berger, M.F. et al. Integrative analysis of the melanoma transcriptome. Genome Res. 20, 413427 (2010).
  30. Zawada, A.M. et al. SuperSAGE evidence for CD14.CD16+ monocytes as a third monocyte subset. Blood 118, e50e61 (2011).
  31. Bernstein, B.E. et al. The NIH roadmap epigenomics mapping consortium. Nat. Biotechnol. 28, 10451048 (2010).
  32. Allison, D.B., Cui, X., Page, G.P. & Sabripour, M. Microarray data analysis: from disarray to consolidation and consensus. Nat. Rev. Genet. 7, 5565 (2006).
  33. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 17541760 (2009).
  34. McKenna, A. et al. The genome analysis toolkit: a mapreduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 12971303 (2010).
  35. Sherry, S.T., Ward, M. & Sirotkin, K. dbSNP-database for single nucleotide polymorphisms and other classes of minor genetic variation. Genome Res. 9, 677679 (1999).

Download references

Author information

  1. These authors contributed equally to this work.

    • Daniel Ramsköld &
    • Shujun Luo


  1. Ludwig Institute for Cancer Research, Stockholm, Sweden.

    • Daniel Ramsköld,
    • Qiaolin Deng,
    • Omid R Faridani &
    • Rickard Sandberg
  2. Department of Cell and Molecular Biology, Karolinska Institutet, Stockholm, Sweden.

    • Daniel Ramsköld &
    • Rickard Sandberg
  3. Illumina, Inc., Hayward, California, USA.

    • Shujun Luo,
    • Robin Li,
    • Irina Khrebtukova &
    • Gary P Schroth
  4. Department of Chemical Physiology, Center for Regenerative Medicine, The Scripps Research Institute, San Diego, La Jolla, California, USA.

    • Yu-Chieh Wang &
    • Jeanne F Loring
  5. Rebecca and John Moores Cancer Center, San Diego, La Jolla, California, USA.

    • Gregory A Daniels
  6. Department of Reproductive Medicine, University of California, San Diego, La Jolla, California, USA.

    • Louise C Laurent


D.R. designed and performed the computational analyses of sequencing reads, prepared figures, tables and methods, and contributed manuscript text. S.L. and R.L. developed protocols and created libraries. I.K. and S.L. did primary data analysis. Y.-C.W., G.A.D. and J.F.L. prepared melanoma circulating tumor cells, melanocytes and melanoma cell line cells. O.R.F. and Q.D. contributed additional sequencing libraries. L.C.L. and G.P.S. contributed to study design and manuscript text. R.S. designed the study and prepared the manuscript, with input from other authors.

Competing financial interests

S.L., R.L., I.K. and G.P.S. are employees and shareholders of Illumina.

Corresponding author

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (1M)

    Supplementary Figs. 1–11

Excel files

  1. Supplementary Table 1 (45K)

    List of Smart-Seq and standard mRNA-Seq data generated

  2. Supplementary Table 2 (16K)

    List of studies reporting total RNA amount per cell for different mammalian cell types

  3. Supplementary Table 3 (45K)

    List of exons with significantly different inclusion levels in cancer cell line cells

  4. Supplementary Table 4 (5M)

    Differentially expressed genes between circulating tumor cells, primary melanocytes and melanoma cell lines

  5. Supplementary Table 5 (16K)

    Functional categories enriched among differentially expressed genes

Additional data