Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Computational and analytical challenges in single-cell transcriptomics

Key Points

  • Until recently, RNA profiling was limited to ensemble-based approaches, which average over bulk populations of cells. Technological advances in single-cell RNA sequencing (scRNA-seq) now enable the transcriptomes of large numbers of individual cells to be assayed in an unbiased manner.

  • To ensure that scRNA-seq data are fully exploited and interpreted correctly, it is important to apply appropriate computational and statistical approaches. Methods and principles previously developed for bulk RNA sequencing can be reused for this purpose; however, scRNA-seq data analysis poses several unique challenges that require new analytical strategies.

  • At the experimental design stage, unique molecular identifiers and quantitative standards such as spike-ins need to be considered to allow accurate normalization and quality control of the raw data.

  • Prior to using scRNA-seq data for biological discovery, it is important to consider both technical variability and confounding factors such as batch effects, the cell cycle or apoptosis. Computational methods that account for technical variation and remove confounding factors are beginning to emerge.

  • The processed and normalized scRNA-seq data provide unique analysis opportunities that allow novel biological discoveries to be made. These include identification and characterization of cell types and the study of their organization in space and/or time; inference of gene regulatory networks and their robustness across individual cells; and characterization of the stochastic component of transcription.

Abstract

The development of high-throughput RNA sequencing (RNA-seq) at the single-cell level has already led to profound new discoveries in biology, ranging from the identification of novel cell types to the study of global patterns of stochastic gene expression. Alongside the technological breakthroughs that have facilitated the large-scale generation of single-cell transcriptomic data, it is important to consider the specific computational and analytical challenges that still have to be overcome. Although some tools for analysing RNA-seq data from bulk cell populations can be readily applied to single-cell RNA-seq data, many new computational strategies are required to fully exploit this data type and to enable a comprehensive yet detailed study of gene expression at the single-cell level.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Comparison of bulk and scRNA-seq analytical strategies.
Figure 2: Quality control and normalization.
Figure 3: Confounding variables and how to account for them.
Figure 4: Finding new cell types and allocating cells along a differentiation cascade.
Figure 5: The kinetics of transcription.

Similar content being viewed by others

References

  1. Bernstein, B. E. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

    Google Scholar 

  2. Brawand, D. et al. The evolution of gene expression levels in mammalian organs. Nature 478, 343–348 (2011).

    CAS  PubMed  Google Scholar 

  3. Blekhman, R., Oshlack, A., Chabot, A. E., Smyth, G. K. & Gilad, Y. Gene regulation in primates evolves under tissue-specific selection pressures. PLoS Genet. 4, e1000271 (2008).

    PubMed  PubMed Central  Google Scholar 

  4. Deng, Q., Ramskold, D., Reinius, B. & Sandberg, R. Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells. Science 343, 193–196 (2014).

    CAS  PubMed  Google Scholar 

  5. Barreiro, L. B. et al. Deciphering the genetic architecture of variation in the immune response to Mycobacterium tuberculosis infection. Proc. Natl Acad. Sci. USA 109, 1204–1209 (2012).

    CAS  PubMed  Google Scholar 

  6. Curtis, C. et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486, 346–352 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  7. Shapiro, E., Biezuner, T. & Linnarsson, S. Single-cell sequencing-based technologies will revolutionize whole-organism science. Nature Rev. Genet. 14, 618–630 (2013). This is a related review discussing challenges and analysis opportunities of single-cell sequencing, for example, to reconstruct lineages in cancer.

    CAS  PubMed  Google Scholar 

  8. Perou, C. M. et al. Molecular portraits of human breast tumours. Nature 406, 747–752 (2000).

    CAS  PubMed  Google Scholar 

  9. Marioni, J. C., Mason, C. E., Mane, S. M., Stephens, M. & Gilad, Y. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 18, 1509–1517 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  10. Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-seq. Nature Methods 5, 621–628 (2008).

    CAS  PubMed  Google Scholar 

  11. Nagalakshmi, U. et al. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 320, 1344–1349 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  12. Perry, G. H. et al. Comparative RNA sequencing reveals substantial genetic variation in endangered primates. Genome Res. 22, 602–610 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  13. van 't Veer, L. J. et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002).

    CAS  PubMed  Google Scholar 

  14. Sandberg, R. Entering the era of single-cell transcriptomics in biology and medicine. Nature Methods 11, 22–24 (2014).

    CAS  PubMed  Google Scholar 

  15. Ohnishi, Y. et al. Cell-to-cell expression variability followed by signal reinforcement progressively segregates early mouse lineages. Nature Cell Biol. 16, 27–37 (2014).

    CAS  PubMed  Google Scholar 

  16. Skamagki, M., Wicher, K. B., Jedrusik, A., Ganguly, S. & Zernicka-Goetz, M. Asymmetric localization of Cdx2 mRNA during the first cell-fate decision in early mouse development. Cell Rep. 3, 442–457 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. Tang, F. et al. Tracing the derivation of embryonic stem cells from the inner cell mass by single-cell RNA-Seq analysis. Cell Stem Cell 6, 468–478 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  18. Diez-Roux, G. et al. A high-resolution anatomical atlas of the transcriptome in the mouse embryo. PLoS Biol. 9, e1000582 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  19. Munsky, B., Neuert, G. & van Oudenaarden, A. Using gene expression noise to understand gene regulation. Science 336, 183–187 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  20. Raj, A. & van Oudenaarden, A. Nature, nurture, or chance: stochastic gene expression and its consequences. Cell 135, 216–226 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. Chalfie, M., Tu, Y., Euskirchen, G., Ward, W. W. & Prasher, D. C. Green fluorescent protein as a marker for gene expression. Science 263, 802–805 (1994).

    CAS  PubMed  Google Scholar 

  22. Coons, A. H., Creech, H. J. & Jones, R. N. Immunological properties of an antibody containing a fluorescent group. Proc. Soc. Exp. Biol. Med. 47, 200–202 (1941).

    CAS  Google Scholar 

  23. Taniguchi, K., Kajiyama, T. & Kambara, H. Quantitative analysis of gene expression in a single cell by qPCR. Nature Methods 6, 503–506 (2009).

    CAS  PubMed  Google Scholar 

  24. Raj, A., van den Bogaard, P., Rifkin, S. A., van Oudenaarden, A. & Tyagi, S. Imaging individual mRNA molecules using multiple singly labeled probes. Nature Methods 5, 877–879 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. Faddah, D. A. et al. Single-cell analysis reveals that expression of nanog is biallelic and equally variable as that of other pluripotency factors in mouse ESCs. Cell Stem Cell 13, 23–29 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  26. Tang, F. et al. mRNA-seq whole-transcriptome analysis of a single cell. Nature Methods 6, 377–382 (2009).

    CAS  PubMed  Google Scholar 

  27. Islam, S. et al. Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Res. 21, 1160–1167 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  28. Ramskold, D. et al. Full-length mRNA-seq from single-cell levels of RNA and individual circulating tumor cells. Nature Biotech. 30, 777–782 (2012).

    Google Scholar 

  29. Sasagawa, Y. et al. Quartz-seq: a highly reproducible and sensitive single-cell RNA sequencing method, reveals non-genetic gene-expression heterogeneity. Genome Biol. 14, R31 (2013).

    PubMed  PubMed Central  Google Scholar 

  30. Jaitin, D. A. et al. Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types. Science 343, 776–779 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. Hashimshony, T., Wagner, F., Sher, N. & Yanai, I. CEL-seq: single-cell RNA-seq by multiplexed linear amplification. Cell Rep. 2, 666–673 (2012).

    CAS  PubMed  Google Scholar 

  32. Picelli, S. et al. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nature Methods 10, 1096–1098 (2013). Recent protocol developments, such as the development of Smart-seq2, have helped to substantially reduce biases and improved the sensitivity of scRNA-seq.

    CAS  PubMed  Google Scholar 

  33. Brennecke, P. et al. Accounting for technical noise in single-cell RNA-seq experiments. Nature Methods 10, 1093–1095 (2013). This paper reports a statistical approach that estimates and accounts for technical sources of variation in scRNA-seq experiments. This method exploits spike-ins to separate technical and biological variability of individual genes (see also reference 75).

    CAS  PubMed  Google Scholar 

  34. Wu, A. R. et al. Quantitative assessment of single-cell RNA-sequencing methods. Nature Methods 11, 41–46 (2014).

    CAS  PubMed  Google Scholar 

  35. Patel, A. P. et al. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science 344, 1396–1401 (2014). This paper provides an example in which sequencing the transcriptomes of a large number of single cells provided important insights into intra- and inter-tumour heterogeneity.

    CAS  PubMed  PubMed Central  Google Scholar 

  36. Shalek, A. K. et al. Single-cell RNA-seq reveals dynamic paracrine control of cellular variation. Nature 509, 363–369 (2014).

    Google Scholar 

  37. Wang, Z., Gerstein, M. & Snyder, M. RNA-seq: a revolutionary tool for transcriptomics. Nature Rev. Genet. 10, 57–63 (2009).

    CAS  PubMed  Google Scholar 

  38. Oshlack, A., Robinson, M. D. & Young, M. D. From RNA-seq reads to differential expression results. Genome Biol. 11, 220 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  39. Jiang, L. et al. Synthetic spike-in standards for RNA-seq experiments. Genome Res. 21, 1543–1551 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  40. Islam, S. et al. Quantitative single-cell RNA-seq with unique molecular identifiers. Nature Methods 11, 163–166 (2014). UMIs allow individual molecules to be barcoded. This protocol enables the absolute number of transcribed molecules to be estimated independently of amplification biases.

    CAS  PubMed  Google Scholar 

  41. Fonseca, N. A., Rung, J., Brazma, A. & Marioni, J. C. Tools for mapping high-throughput sequencing data. Bioinformatics 28, 3169–3177 (2012).

    CAS  PubMed  Google Scholar 

  42. Trapnell, C., Pachter, L. & Salzberg, S. L. TopHat: discovering splice junctions with RNA-seq. Bioinformatics 25, 1105–1111 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  43. Wu, T. D. & Nacu, S. Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26, 873–881 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  44. Guttman, M. et al. Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nature Biotech. 28, 503–510 (2010).

    CAS  Google Scholar 

  45. Anders, S., Pyl, P. T. & Huber, W. HTseq — a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015).

    CAS  PubMed  Google Scholar 

  46. Davis, M. P., van Dongen, S., Abreu-Goodger, C., Bartonicek, N. & Enright, A. J. Kraken: a set of tools for quality control and analysis of high-throughput sequence data. Methods 63, 41–49 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  47. Robinson, J. T. et al. Integrative genomics viewer. Nature Biotech. 29, 24–26 (2011).

    CAS  Google Scholar 

  48. Thorvaldsdottir, H., Robinson, J. T. & Mesirov, J. P. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform. 14, 178–192 (2013).

    CAS  PubMed  Google Scholar 

  49. Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  50. Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010). This seminal paper describes statistical methods to test for differential gene expression using RNA-seq data. Although developed in the context of RNA-seq studies on bulk cell populations, this work has laid the foundation for a large family of normalization procedures, including recent methods that are dedicated to scRNA-seq data (see reference 33).

    CAS  PubMed  PubMed Central  Google Scholar 

  51. Robinson, M. D. & Oshlack, A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11, R25 (2010).

    PubMed  PubMed Central  Google Scholar 

  52. Lin, C. Y. et al. Transcriptional amplification in tumor cells with elevated c-Myc. Cell 151, 56–67 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  53. Loven, J. et al. Revisiting global gene expression analysis. Cell 151, 476–482 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  54. Krebs, J. E., Goldstein, E. S. & Kilpatrick, S. T. Lewin's Genes XI (Jones & Bartlett Publishers, 2014).

    Google Scholar 

  55. Kharchenko, P. V., Silberstein, L. & Scadden, D. T. Bayesian approach to single-cell differential expression analysis. Nature Methods 11, 740–742 (2014). This paper presents a Bayesian approach to test for differential gene expression in scRNA-seq studies. This approach extends methods for bulk RNA-seq (for example, reference 50) by accounting for single-cell-specific noise, such as dropout events and amplification biases.

    CAS  PubMed  PubMed Central  Google Scholar 

  56. Pickrell, J. K. et al. Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 464, 768–772 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  57. Leek, J. T. & Storey, J. D. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 3, 1724–1735 (2007).

    CAS  PubMed  Google Scholar 

  58. Stegle, O., Parts, L., Durbin, R. & Winn, J. A. Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies. PLoS Comput. Biol. 6, e1000770 (2010).

    PubMed  PubMed Central  Google Scholar 

  59. Stegle, O., Parts, L., Piipari, M., Winn, J. & Durbin, R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nature Protoc. 7, 500–507 (2012).

    CAS  Google Scholar 

  60. Risso, D., Ngai, J., Speed, T. P. & Dudoit, S. Normalization of RNA-seq data using factor analysis of control genes or samples. Nature Biotech. 32, 896–902 (2014).

    CAS  Google Scholar 

  61. Buettner, F. et al. Accounting for cell-to-cell heterogeneity in single-cell RNA-seq data reveals novel structure between cells. Nature Biotech. http://dx.doi.org/10.1038/nbt.3102 (2015). Confounding factors such as the cell cycle can obscure biologically relevant molecular signatures in scRNA-seq data sets. This work describes a computational approach to account for confounding factors. Related methods developed for bulk RNA profiling experiments are described in references 57–60.

  62. Treutlein, B. et al. Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq. Nature 509, 371–375 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  63. Durruthy-Durruthy, R. et al. Reconstruction of the mouse otocyst and early neuroblast lineage at single-cell resolution. Cell 157, 964–978 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  64. Moignard, V. et al. Characterization of transcriptional networks in blood stem and progenitor cells using high-throughput single-cell gene expression analysis. Nature Cell Biol. 15, 363–372 (2013).

    CAS  PubMed  Google Scholar 

  65. Mahata, B. et al. Single-cell RNA sequencing reveals T helper cells synthesizing steroids de novo to contribute to immune homeostasis. Cell Rep. 7, 1130–1142 (2014). This paper provides an example from T cell biology that shows how gene–gene correlations in scRNA-seq studies can be used to reveal novel mechanistic insights.

    CAS  PubMed  PubMed Central  Google Scholar 

  66. Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nature Biotech. 32, 381–386 (2014). This paper describes a computational approach to reconstruct a pseudotemporal order from multiple scRNA-seq snapshot experiments, for example, along a differentiation trajectory.

    CAS  Google Scholar 

  67. Lee, J. H. et al. Highly multiplexed subcellular RNA sequencing in situ. Science 343, 1360–1363 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  68. Lovatt, D. et al. Transcriptome in vivo analysis (TIVA) of spatially defined single cells in live tissue. Nature Methods 11, 190–196 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  69. Pettit, J. B., Tomer, R., Achim, K., Azizi, L. & Marioni, J. C. Identifying cell types from spatially referenced single-cell expression datasets. PLoS Comput. Biol. 10, e1003824 (2014).

    PubMed  PubMed Central  Google Scholar 

  70. Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).

    CAS  PubMed  Google Scholar 

  71. Hardcastle, T. J. & Kelly, K. A. baySeq: empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinformatics 11, 422 (2010).

    PubMed  PubMed Central  Google Scholar 

  72. Shalek, A. K. et al. Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells. Nature 498, 236–240 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  73. Anders, S., Reyes, A. & Huber, W. Detecting differential usage of exons from RNA-seq data. Genome Res. 22, 2008–2017 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  74. Katz, Y., Wang, E. T., Airoldi, E. M. & Burge, C. B. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nature Methods 7, 1009–1015 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  75. Grun, D., Kester, L. & van Oudenaarden, A. Validation of noise models for single-cell transcriptomics. Nature Methods 11, 637–640 (2014).

    PubMed  Google Scholar 

  76. Friedman, J., Hastie, T. & Tibshirani, R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432–441 (2008).

    PubMed  Google Scholar 

  77. Segal, E. et al. Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nature Genet. 34, 166–176 (2003).

    CAS  PubMed  Google Scholar 

  78. Liao, J. C. et al. Network component analysis: reconstruction of regulatory signals in biological systems. Proc. Natl Acad. Sci. USA 100, 15522–15527 (2003).

    CAS  PubMed  Google Scholar 

  79. Bansal, M., Belcastro, V., Ambesi-Impiombato, A. & di Bernardo, D. How to infer gene networks from expression profiles. Mol. Syst. Biol. 3, 78 (2007).

    PubMed  PubMed Central  Google Scholar 

  80. Pe'er, D., Regev, A., Elidan, G. & Friedman, N. Inferring subnetworks from perturbed expression profiles. Bioinformatics 17 S215–S224 (2001).

    PubMed  Google Scholar 

  81. Kim, J. K. & Marioni, J. C. Inferring the kinetics of stochastic gene expression from single-cell RNA-sequencing data. Genome Biol. 14, R7 (2013).

    PubMed  PubMed Central  Google Scholar 

  82. Raj, A., Peskin, C. S., Tranchina, D., Vargas, D. Y. & Tyagi, S. Stochastic mRNA synthesis in mammalian cells. PLoS Biol. 4, e309 (2006).

    PubMed  PubMed Central  Google Scholar 

  83. Kaern, M., Elston, T. C., Blake, W. J. & Collins, J. J. Stochasticity in gene expression: from theories to phenotypes. Nature Rev. Genet. 6, 451–464 (2005).

    CAS  PubMed  Google Scholar 

  84. Larson, D. R. What do expression dynamics tell us about the mechanism of transcription? Curr. Opin. Genet. Dev. 21, 591–599 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  85. Schwanhausser, B. et al. Global quantification of mammalian gene expression control. Nature 473, 337–342 (2011).

    PubMed  Google Scholar 

  86. McManus, C. J. et al. Regulatory divergence in Drosophila revealed by mRNA-seq. Genome Res. 20, 816–825 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

The authors acknowledge members of the Marioni, Stegle and Teichmann groups for comments on the manuscript. They also acknowledge S. Linnarsson for advice on how to present computational challenges relating to scRNA-seq data generated using UMI-based protocols.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to John C. Marioni.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Related links

PowerPoint slides

Glossary

Spike-in

A few types of RNA with known sequence and quantity (generated either artificially or from a pool of RNA from a distantly related species) that are added as internal controls in RNA sequencing experiments.

Unique molecular identifiers

(UMIs). Tens of thousands of short DNA sequences (6–10 nucleotides in length), which are incorporated in molecules of interest before amplification, thus allowing biases to be accounted for.

Technical variability

Variability in gene expression levels between cells that arises through technical effects.

Read alignment

The alignment of short reads generated from a next-generation sequencing experiment to a reference genome or transcriptome.

Gene expression counts

The number of sequencing reads or unique molecular identifiers that map to a particular gene. These raw data form the basis of gene expression level quantification approaches.

Duplicated reads

Identical copies of a sequencing read generated by the PCR amplification process.

Principal component analysis

(PCA). A statistical method to simplify a complex data set by transforming a series of correlated variables into a smaller number of uncorrelated variables called principal components.

Fragments per kilobase of exon per million fragments mapped

(FPKM). A method for quantifying gene expression levels from RNA sequencing data that normalizes for sequencing depth and transcript length.

Size factors

Quantities used to normalize gene expression levels between independently generated RNA sequencing libraries; they account for differences in sequencing depth.

Allele-specific expression

Gene expression levels measured separately for each of the two parental alleles. RNA derived from each allele can be quantified and assessed separately when RNA sequencing reads overlap with heterozygous sites in the genome.

Capture efficiency

The percentage of mRNA molecules in the cell lysate that are captured, amplified and sequenced. This is normally quantified using spike-in molecules.

Confounding factors

Unobserved covariates that affect gene expression levels and that can obscure the interpretation if not accounted for.

Batch effects

Systematic differences in gene expression levels between independent cells from the same population, which arise as a result of sample preparation.

Biological replicates

Independent replicates from the same population.

Markov random field

(MRF). A particular class of statistical model that can exploit smoothness of measurements in a spatial grid, thereby improving the accuracy of parameter estimates.

Dropout

The false quantification of a gene as 'unexpressed' due to the corresponding transcript being 'missed' during the reverse-transcription step. This leads to a lack of detection during sequencing.

Monoallelic expression

The expression of only one of the two parental alleles.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Stegle, O., Teichmann, S. & Marioni, J. Computational and analytical challenges in single-cell transcriptomics. Nat Rev Genet 16, 133–145 (2015). https://doi.org/10.1038/nrg3833

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg3833

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing