Abstract
Single-cell transcriptomics is becoming an important component of the molecular biologist's toolkit. A critical step when analyzing data generated using this technology is normalization. However, normalization is typically performed using methods developed for bulk RNA sequencing or even microarray data, and the suitability of these methods for single-cell transcriptomics has not been assessed. We here discuss commonly used normalization approaches and illustrate how these can produce misleading results. Finally, we present alternative approaches and provide recommendations for single-cell RNA sequencing users.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
scRNASequest: an ecosystem of scRNA-seq analysis, visualization, and publishing
BMC Genomics Open Access 02 May 2023
-
Identification of genetic variants that impact gene co-expression relationships using large-scale single-cell data
Genome Biology Open Access 18 April 2023
-
Guidelines for bioinformatics of single-cell sequencing data analysis in Alzheimer’s disease: review, recommendation, implementation and application
Molecular Neurodegeneration Open Access 02 March 2022
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout



References
Tang, F. et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nat. Methods 6, 377–382 (2009).
Shapiro, E., Biezuner, T. & Linnarsson, S. Single-cell sequencing-based technologies will revolutionize whole-organism science. Nat. Rev. Genet. 14, 618–630 (2013).
Stegle, O., Teichmann, S.A. & Marioni, J.C. Computational and analytical challenges in single-cell transcriptomics. Nat. Rev. Genet. 16, 133–145 (2015).
Saliba, A.-E., Westermann, A.J., Gorski, S.A. & Vogel, J. Single-cell RNA-seq: advances and future challenges. Nucleic Acids Res. 42, 8845–8860 (2014).
Gawad, C., Koh, W. & Quake, S.R. Single-cell genome sequencing: current state of the science. Nat. Rev. Genet. 17, 175–188 (2016).
Brennecke, P. et al. Accounting for technical noise in single-cell RNA-seq experiments. Nat. Methods 10, 1093–1095 (2013).
Kharchenko, P.V., Silberstein, L. & Scadden, D.T. Bayesian approach to single-cell differential expression analysis. Nat. Methods 11, 740–742 (2014).
Finak, G. et al. MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol. 16, 278 (2015).
Pierson, E. & Yau, C. ZIFA: dimensionality reduction for zero-inflated single-cell gene expression analysis. Genome Biol. 16, 241 (2015).
Bacher, R. & Kendziorski, C. Design and computational analysis of single-cell RNA-sequencing experiments. Genome Biol. 17, 63 (2016).
Vallejos, C.A., Richardson, S. & Marioni, J.C. Beyond comparisons of means: understanding changes in gene expression at the single-cell level. Genome Biol. 17, 70 (2016).
Jiang, L. et al. Synthetic spike-in standards for RNA-seq experiments. Genome Res. 21, 1543–1551 (2011).
Kolodziejczyk, A.A., Kim, J.K., Svensson, V., Marioni, J.C. & Teichmann, S.A. The technology and biology of single-cell RNA sequencing. Mol. Cell 58, 610–620 (2015).
Lun, A.T., Bach, K. & Marioni, J.C. Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biol. 17, 75 (2016).
Smyth, G.K. & Speed, T. Normalization of cDNA microarray data. Methods 31, 265–273 (2003).
Bullard, J.H., Purdom, E., Hansen, K.D. & Dudoit, S. Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics 11, 94 (2010).
Dillies, M.-A. et al. A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. Brief. Bioinform. 14, 671–683 (2013).
Hicks, S.C., Teng, M. & Irizarry, R.A. On the widespread and critical impact of systematic bias and batch effects in single-cell RNA-Seq data. Preprint at http://biorxiv.org/content/early/2015/08/25/025528 (2015).
Risso, D., Ngai, J., Speed, T.P. & Dudoit, S. Normalization of RNA-seq data using factor analysis of control genes or samples. Nat. Biotechnol. 32, 896–902 (2014).
Leek, J.T. svaseq: removing batch effects and other unwanted noise from sequencing data. Nucleic Acids Res. 42, e161 (2014).
Islam, S. et al. Quantitative single-cell RNA-seq with unique molecular identifiers. Nat. Methods 11, 163–166 (2014).
Grün, D. & van Oudenaarden, A. Design and analysis of single-cell sequencing experiments. Cell 163, 799–810 (2015).
Vallejos, C.A., Marioni, J.C. & Richardson, S. BASiCS: Bayesian analysis of single-cell sequencing data. PLoS Comput. Biol. 11, e1004333 (2015).
Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008).
Li, B., Ruotti, V., Stewart, R.M., Thomson, J.A. & Dewey, C.N. RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics 26, 493–500 (2010).
Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010).
Robinson, M.D. & Oshlack, A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11, R25 (2010).
Klein, A.M. et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161, 1187–1201 (2015).
Pollen, A.A. et al. Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex. Nat. Biotechnol. 32, 1053–1058 (2014).
Zeisel, A. et al. Brain structure. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq. Science 347, 1138–1142 (2015).
Macosko, E.Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015).
Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 32, 381–386 (2014).
Buettner, F. et al. Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nat. Biotechnol. 33, 155–160 (2015).
Haghverdi, L., Büttner, M., Wolf, F.A., Buettner, F. & Theis, F.J. Diffusion pseudotime robustly reconstructs lineage branching. Nat. Methods 13, 845–848 (2016).
Ding, B. et al. Normalization and noise reduction for single cell RNA-seq experiments. Bioinformatics 31, 2225–2227 (2015).
Katayama, S., Töhönen, V., Linnarsson, S. & Kere, J. SAMstrt: statistical test for differential expression in single-cell transcriptome with spike-in normalization. Bioinformatics 29, 2943–2945 (2013).
Bacher, R. et al. SCnorm: a quantile-regression based approach for robust normalization of single-cell RNA-seq data. Nat. Methods http://dx.doi.org/10.1038/nmeth.4263 (2017).
Qiu, X. et al. Single-cell mRNA quantification and differential analysis with Census. Nat. Methods 14, 309–315 (2017).
Islam, S. et al. Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Res. 21, 1160–1167 (2011).
Munro, S.A. et al. Assessing technical performance in differential gene expression experiments with external spike-in RNA control ratio mixtures. Nat. Commun. 5, 5125 (2014).
Goolam, M. et al. Heterogeneity in Oct4 and Sox2 targets biases cell fate in 4-cell mouse embryos. Cell 165, 61–74 (2016).
Scialdone, A. et al. Computational assignment of cell-cycle stage from single-cell transcriptome data. Methods 85, 54–61 (2015).
Hardwick, S.A. et al. Spliced synthetic genes as internal controls in RNA sequencing experiments. Nat. Methods 13, 792–798 (2016).
Lovén, J. et al. Revisiting global gene expression analysis. Cell 151, 476–482 (2012).
Cole, M. & Risso, D. scone: Single Cell Overview of Normalized Expression data, R package version 0.99.6 (2016).
Kolodziejczyk, A.A. et al. Single cell RNA-sequencing of pluripotent states unlocks modular transcriptional variation. Cell Stem Cell 17, 471–485 (2015).
Acknowledgements
We thank several members of the Marioni laboratory (European Molecular Biology Laboratory - European Bioinformatics Institute, EMBL-EBI; Cancer Research UK - Cambridge Institute, CRUK-CI) for support and discussions throughout the preparation of this manuscript. In particular, we are grateful to A. Lun (CRUK-CI) for constructive comments on an earlier version of the manuscript. We are also grateful to UC Berkeley collaborator J. Ngai and his group members. C.A.V., A.S., and J.C.M. acknowledge core EMBL funding. C.A.V. was supported by core MRC funding (MRC MC UP 0801/1) and by The Alan Turing Institute under the EPSRC grant no. EP/N510129/1. J.C.M. acknowledges core support from CRUK. A.S. acknowledges funding from the Wellcome Trust Strategic Award 105031/D/14/Z, “Tracing early mammalian lineage decisions by single-cell genomics.” D.R. and S.D. are supported by the US National Institutes of Health BRAIN Initiative grant no. U01 MH105979 (PI, J. Ngai).
Author information
Authors and Affiliations
Contributions
C.A.V., D.R., and A.S. performed analyses. C.A.V., D.R., A.S., S.D., and J.C.M. wrote the manuscript. S.D. and J.C.M. supervised the study.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Text and Figures
Supplementary Data 1–3. (PDF 35324 kb)
Rights and permissions
About this article
Cite this article
Vallejos, C., Risso, D., Scialdone, A. et al. Normalizing single-cell RNA sequencing data: challenges and opportunities. Nat Methods 14, 565–571 (2017). https://doi.org/10.1038/nmeth.4292
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nmeth.4292
This article is cited by
-
scRNASequest: an ecosystem of scRNA-seq analysis, visualization, and publishing
BMC Genomics (2023)
-
Identification of genetic variants that impact gene co-expression relationships using large-scale single-cell data
Genome Biology (2023)
-
Normalizing cancer RNA-seq data for library size, tumor purity and batch effects
Nature Biotechnology (2023)
-
AE-TPGG: a novel autoencoder-based approach for single-cell RNA-seq data imputation and dimensionality reduction
Frontiers of Computer Science (2023)
-
Best practices for single-cell analysis across modalities
Nature Reviews Genetics (2023)