The development of high-throughput RNA sequencing (RNA-seq) at the single-cell level has already led to profound new discoveries in biology, ranging from the identification of novel cell types to the study of global patterns of stochastic gene expression. Alongside the technological breakthroughs that have facilitated the large-scale generation of single-cell transcriptomic data, it is important to consider the specific computational and analytical challenges that still have to be overcome. Although some tools for analysing RNA-seq data from bulk cell populations can be readily applied to single-cell RNA-seq data, many new computational strategies are required to fully exploit this data type and to enable a comprehensive yet detailed study of gene expression at the single-cell level.
At a glance
- An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012). et al.
- The evolution of gene expression levels in mammalian organs. Nature 478, 343–348 (2011). et al.
- Gene regulation in primates evolves under tissue-specific selection pressures. PLoS Genet. 4, e1000271 (2008). , , , &
- Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells. Science 343, 193–196 (2014). , , &
- Deciphering the genetic architecture of variation in the immune response to Mycobacterium tuberculosis infection. Proc. Natl Acad. Sci. USA 109, 1204–1209 (2012). et al.
- The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486, 346–352 (2012). et al.
- Single-cell sequencing-based technologies will revolutionize whole-organism science. Nature Rev. Genet. 14, 618–630 (2013).
This is a related review discussing challenges and analysis opportunities of single-cell sequencing, for example, to reconstruct lineages in cancer.
- Molecular portraits of human breast tumours. Nature 406, 747–752 (2000). et al.
- RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 18, 1509–1517 (2008). , , , &
- Mapping and quantifying mammalian transcriptomes by RNA-seq. Nature Methods 5, 621–628 (2008). , , , &
- The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 320, 1344–1349 (2008). et al.
- Comparative RNA sequencing reveals substantial genetic variation in endangered primates. Genome Res. 22, 602–610 (2012). et al.
- Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002). et al.
- Entering the era of single-cell transcriptomics in biology and medicine. Nature Methods 11, 22–24 (2014).
- Cell-to-cell expression variability followed by signal reinforcement progressively segregates early mouse lineages. Nature Cell Biol. 16, 27–37 (2014). et al.
- Asymmetric localization of Cdx2 mRNA during the first cell-fate decision in early mouse development. Cell Rep. 3, 442–457 (2013). , , , &
- Tracing the derivation of embryonic stem cells from the inner cell mass by single-cell RNA-Seq analysis. Cell Stem Cell 6, 468–478 (2010). et al.
- A high-resolution anatomical atlas of the transcriptome in the mouse embryo. PLoS Biol. 9, e1000582 (2011). et al.
- Using gene expression noise to understand gene regulation. Science 336, 183–187 (2012). , &
- Nature, nurture, or chance: stochastic gene expression and its consequences. Cell 135, 216–226 (2008). &
- Green fluorescent protein as a marker for gene expression. Science 263, 802–805 (1994). , , , &
- Immunological properties of an antibody containing a fluorescent group. Proc. Soc. Exp. Biol. Med. 47, 200–202 (1941). , &
- Quantitative analysis of gene expression in a single cell by qPCR. Nature Methods 6, 503–506 (2009). , &
- Imaging individual mRNA molecules using multiple singly labeled probes. Nature Methods 5, 877–879 (2008). , , , &
- Single-cell analysis reveals that expression of nanog is biallelic and equally variable as that of other pluripotency factors in mouse ESCs. Cell Stem Cell 13, 23–29 (2013). et al.
- mRNA-seq whole-transcriptome analysis of a single cell. Nature Methods 6, 377–382 (2009). et al.
- Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Res. 21, 1160–1167 (2011). et al.
- Full-length mRNA-seq from single-cell levels of RNA and individual circulating tumor cells. Nature Biotech. 30, 777–782 (2012). et al.
- Quartz-seq: a highly reproducible and sensitive single-cell RNA sequencing method, reveals non-genetic gene-expression heterogeneity. Genome Biol. 14, R31 (2013). et al.
- Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types. Science 343, 776–779 (2014). et al.
- CEL-seq: single-cell RNA-seq by multiplexed linear amplification. Cell Rep. 2, 666–673 (2012). , , &
- Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nature Methods 10, 1096–1098 (2013).
Recent protocol developments, such as the development of Smart-seq2, have helped to substantially reduce biases and improved the sensitivity of scRNA-seq.
- Accounting for technical noise in single-cell RNA-seq experiments. Nature Methods 10, 1093–1095 (2013).
This paper reports a statistical approach that estimates and accounts for technical sources of variation in scRNA-seq experiments. This method exploits spike-ins to separate technical and biological variability of individual genes (see also reference 75).
- Quantitative assessment of single-cell RNA-sequencing methods. Nature Methods 11, 41–46 (2014). et al.
- Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science 344, 1396–1401 (2014).
This paper provides an example in which sequencing the transcriptomes of a large number of single cells provided important insights into intra- and inter-tumour heterogeneity.
- Single-cell RNA-seq reveals dynamic paracrine control of cellular variation. Nature 509, 363–369 (2014). et al.
- RNA-seq: a revolutionary tool for transcriptomics. Nature Rev. Genet. 10, 57–63 (2009). , &
- From RNA-seq reads to differential expression results. Genome Biol. 11, 220 (2010). , &
- Synthetic spike-in standards for RNA-seq experiments. Genome Res. 21, 1543–1551 (2011). et al.
- Quantitative single-cell RNA-seq with unique molecular identifiers. Nature Methods 11, 163–166 (2014).
UMIs allow individual molecules to be barcoded. This protocol enables the absolute number of transcribed molecules to be estimated independently of amplification biases.
- Tools for mapping high-throughput sequencing data. Bioinformatics 28, 3169–3177 (2012). , , &
- TopHat: discovering splice junctions with RNA-seq. Bioinformatics 25, 1105–1111 (2009). , &
- Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26, 873–881 (2010). &
- Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nature Biotech. 28, 503–510 (2010). et al.
- HTseq — a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015). , &
- Kraken: a set of tools for quality control and analysis of high-throughput sequence data. Methods 63, 41–49 (2013). , , , &
- Integrative genomics viewer. Nature Biotech. 29, 24–26 (2011). et al.
- Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform. 14, 178–192 (2013). , &
- RSEM: accurate transcript quantification from RNA-seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011). &
- Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010).
This seminal paper describes statistical methods to test for differential gene expression using RNA-seq data. Although developed in the context of RNA-seq studies on bulk cell populations, this work has laid the foundation for a large family of normalization procedures, including recent methods that are dedicated to scRNA-seq data (see reference 33).
- A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11, R25 (2010). &
- Transcriptional amplification in tumor cells with elevated c-Myc. Cell 151, 56–67 (2012). et al.
- Revisiting global gene expression analysis. Cell 151, 476–482 (2012). et al.
- 2014). , & Lewin's Genes XI (Jones & Bartlett Publishers,
- Bayesian approach to single-cell differential expression analysis. Nature Methods 11, 740–742 (2014).
This paper presents a Bayesian approach to test for differential gene expression in scRNA-seq studies. This approach extends methods for bulk RNA-seq (for example, reference 50) by accounting for single-cell-specific noise, such as dropout events and amplification biases.
- Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 464, 768–772 (2010). et al.
- Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 3, 1724–1735 (2007). &
- Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies. PLoS Comput. Biol. 6, e1000770 (2010). , , &
- Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nature Protoc. 7, 500–507 (2012). , , , &
- Normalization of RNA-seq data using factor analysis of control genes or samples. Nature Biotech. 32, 896–902 (2014). , , &
- Accounting for cell-to-cell heterogeneity in single-cell RNA-seq data reveals novel structure between cells. Nature Biotech. http://dx.doi.org/10.1038/nbt.3102 (2015).
Confounding factors such as the cell cycle can obscure biologically relevant molecular signatures in scRNA-seq data sets. This work describes a computational approach to account for confounding factors. Related methods developed for bulk RNA profiling experiments are described in references 57–60.
- Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq. Nature 509, 371–375 (2014). et al.
- Reconstruction of the mouse otocyst and early neuroblast lineage at single-cell resolution. Cell 157, 964–978 (2014). et al.
- Characterization of transcriptional networks in blood stem and progenitor cells using high-throughput single-cell gene expression analysis. Nature Cell Biol. 15, 363–372 (2013). et al.
- Single-cell RNA sequencing reveals T helper cells synthesizing steroids de novo to contribute to immune homeostasis. Cell Rep. 7, 1130–1142 (2014).
This paper provides an example from T cell biology that shows how gene–gene correlations in scRNA-seq studies can be used to reveal novel mechanistic insights.
- The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nature Biotech. 32, 381–386 (2014).
This paper describes a computational approach to reconstruct a pseudotemporal order from multiple scRNA-seq snapshot experiments, for example, along a differentiation trajectory.
- Highly multiplexed subcellular RNA sequencing in situ. Science 343, 1360–1363 (2014). et al.
- Transcriptome in vivo analysis (TIVA) of spatially defined single cells in live tissue. Nature Methods 11, 190–196 (2014). et al.
- Identifying cell types from spatially referenced single-cell expression datasets. PLoS Comput. Biol. 10, e1003824 (2014). , , , &
- edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010). , &
- baySeq: empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinformatics 11, 422 (2010). &
- Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells. Nature 498, 236–240 (2013). et al.
- Detecting differential usage of exons from RNA-seq data. Genome Res. 22, 2008–2017 (2012). , &
- Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nature Methods 7, 1009–1015 (2010). , , &
- Validation of noise models for single-cell transcriptomics. Nature Methods 11, 637–640 (2014). , &
- Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432–441 (2008). , &
- Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nature Genet. 34, 166–176 (2003). et al.
- Network component analysis: reconstruction of regulatory signals in biological systems. Proc. Natl Acad. Sci. USA 100, 15522–15527 (2003). et al.
- How to infer gene networks from expression profiles. Mol. Syst. Biol. 3, 78 (2007). , , &
- Inferring subnetworks from perturbed expression profiles. Bioinformatics 17 S215–S224 (2001). , , &
- Inferring the kinetics of stochastic gene expression from single-cell RNA-sequencing data. Genome Biol. 14, R7 (2013). &
- Stochastic mRNA synthesis in mammalian cells. PLoS Biol. 4, e309 (2006). , , , &
- Stochasticity in gene expression: from theories to phenotypes. Nature Rev. Genet. 6, 451–464 (2005). , , &
- What do expression dynamics tell us about the mechanism of transcription? Curr. Opin. Genet. Dev. 21, 591–599 (2011).
- Global quantification of mammalian gene expression control. Nature 473, 337–342 (2011). et al.
- Regulatory divergence in Drosophila revealed by mRNA-seq. Genome Res. 20, 816–825 (2010). et al.