Perspective | Published:

Normalizing single-cell RNA sequencing data: challenges and opportunities

Nature Methods volume 14, pages 565571 (2017) | Download Citation

Abstract

Single-cell transcriptomics is becoming an important component of the molecular biologist's toolkit. A critical step when analyzing data generated using this technology is normalization. However, normalization is typically performed using methods developed for bulk RNA sequencing or even microarray data, and the suitability of these methods for single-cell transcriptomics has not been assessed. We here discuss commonly used normalization approaches and illustrate how these can produce misleading results. Finally, we present alternative approaches and provide recommendations for single-cell RNA sequencing users.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nat. Methods 6, 377–382 (2009).

  2. 2.

    , & Single-cell sequencing-based technologies will revolutionize whole-organism science. Nat. Rev. Genet. 14, 618–630 (2013).

  3. 3.

    , & Computational and analytical challenges in single-cell transcriptomics. Nat. Rev. Genet. 16, 133–145 (2015).

  4. 4.

    , , & Single-cell RNA-seq: advances and future challenges. Nucleic Acids Res. 42, 8845–8860 (2014).

  5. 5.

    , & Single-cell genome sequencing: current state of the science. Nat. Rev. Genet. 17, 175–188 (2016).

  6. 6.

    et al. Accounting for technical noise in single-cell RNA-seq experiments. Nat. Methods 10, 1093–1095 (2013).

  7. 7.

    , & Bayesian approach to single-cell differential expression analysis. Nat. Methods 11, 740–742 (2014).

  8. 8.

    et al. MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol. 16, 278 (2015).

  9. 9.

    & ZIFA: dimensionality reduction for zero-inflated single-cell gene expression analysis. Genome Biol. 16, 241 (2015).

  10. 10.

    & Design and computational analysis of single-cell RNA-sequencing experiments. Genome Biol. 17, 63 (2016).

  11. 11.

    , & Beyond comparisons of means: understanding changes in gene expression at the single-cell level. Genome Biol. 17, 70 (2016).

  12. 12.

    et al. Synthetic spike-in standards for RNA-seq experiments. Genome Res. 21, 1543–1551 (2011).

  13. 13.

    , , , & The technology and biology of single-cell RNA sequencing. Mol. Cell 58, 610–620 (2015).

  14. 14.

    , & Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biol. 17, 75 (2016).

  15. 15.

    & Normalization of cDNA microarray data. Methods 31, 265–273 (2003).

  16. 16.

    , , & Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics 11, 94 (2010).

  17. 17.

    et al. A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. Brief. Bioinform. 14, 671–683 (2013).

  18. 18.

    , & On the widespread and critical impact of systematic bias and batch effects in single-cell RNA-Seq data. Preprint at (2015).

  19. 19.

    , , & Normalization of RNA-seq data using factor analysis of control genes or samples. Nat. Biotechnol. 32, 896–902 (2014).

  20. 20.

    svaseq: removing batch effects and other unwanted noise from sequencing data. Nucleic Acids Res. 42, e161 (2014).

  21. 21.

    et al. Quantitative single-cell RNA-seq with unique molecular identifiers. Nat. Methods 11, 163–166 (2014).

  22. 22.

    & Design and analysis of single-cell sequencing experiments. Cell 163, 799–810 (2015).

  23. 23.

    , & BASiCS: Bayesian analysis of single-cell sequencing data. PLoS Comput. Biol. 11, e1004333 (2015).

  24. 24.

    , , , & Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008).

  25. 25.

    , , , & RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics 26, 493–500 (2010).

  26. 26.

    & Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010).

  27. 27.

    & A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11, R25 (2010).

  28. 28.

    et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161, 1187–1201 (2015).

  29. 29.

    et al. Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex. Nat. Biotechnol. 32, 1053–1058 (2014).

  30. 30.

    et al. Brain structure. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq. Science 347, 1138–1142 (2015).

  31. 31.

    et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015).

  32. 32.

    et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 32, 381–386 (2014).

  33. 33.

    et al. Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nat. Biotechnol. 33, 155–160 (2015).

  34. 34.

    , , , & Diffusion pseudotime robustly reconstructs lineage branching. Nat. Methods 13, 845–848 (2016).

  35. 35.

    et al. Normalization and noise reduction for single cell RNA-seq experiments. Bioinformatics 31, 2225–2227 (2015).

  36. 36.

    , , & SAMstrt: statistical test for differential expression in single-cell transcriptome with spike-in normalization. Bioinformatics 29, 2943–2945 (2013).

  37. 37.

    et al. SCnorm: a quantile-regression based approach for robust normalization of single-cell RNA-seq data. Nat. Methods (2017).

  38. 38.

    et al. Single-cell mRNA quantification and differential analysis with Census. Nat. Methods 14, 309–315 (2017).

  39. 39.

    et al. Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Res. 21, 1160–1167 (2011).

  40. 40.

    et al. Assessing technical performance in differential gene expression experiments with external spike-in RNA control ratio mixtures. Nat. Commun. 5, 5125 (2014).

  41. 41.

    et al. Heterogeneity in Oct4 and Sox2 targets biases cell fate in 4-cell mouse embryos. Cell 165, 61–74 (2016).

  42. 42.

    et al. Computational assignment of cell-cycle stage from single-cell transcriptome data. Methods 85, 54–61 (2015).

  43. 43.

    et al. Spliced synthetic genes as internal controls in RNA sequencing experiments. Nat. Methods 13, 792–798 (2016).

  44. 44.

    et al. Revisiting global gene expression analysis. Cell 151, 476–482 (2012).

  45. 45.

    & scone: Single Cell Overview of Normalized Expression data, R package version 0.99.6 (2016).

  46. 46.

    et al. Single cell RNA-sequencing of pluripotent states unlocks modular transcriptional variation. Cell Stem Cell 17, 471–485 (2015).

Download references

Acknowledgements

We thank several members of the Marioni laboratory (European Molecular Biology Laboratory - European Bioinformatics Institute, EMBL-EBI; Cancer Research UK - Cambridge Institute, CRUK-CI) for support and discussions throughout the preparation of this manuscript. In particular, we are grateful to A. Lun (CRUK-CI) for constructive comments on an earlier version of the manuscript. We are also grateful to UC Berkeley collaborator J. Ngai and his group members. C.A.V., A.S., and J.C.M. acknowledge core EMBL funding. C.A.V. was supported by core MRC funding (MRC MC UP 0801/1) and by The Alan Turing Institute under the EPSRC grant no. EP/N510129/1. J.C.M. acknowledges core support from CRUK. A.S. acknowledges funding from the Wellcome Trust Strategic Award 105031/D/14/Z, “Tracing early mammalian lineage decisions by single-cell genomics.” D.R. and S.D. are supported by the US National Institutes of Health BRAIN Initiative grant no. U01 MH105979 (PI, J. Ngai).

Author information

Author notes

    • Davide Risso

    Present address: Division of Biostatistics and Epidemiology, Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA.

    • Catalina A Vallejos
    • , Davide Risso
    •  & Antonio Scialdone

    These authors contributed equally to this work.

Affiliations

  1. MRC Biostatistics Unit, Cambridge Institute of Public Health, Cambridge, UK.

    • Catalina A Vallejos
  2. EMBL-European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK.

    • Catalina A Vallejos
    • , Antonio Scialdone
    •  & John C Marioni
  3. The Alan Turing Institute, British Library, London, UK.

    • Catalina A Vallejos
  4. Department of Statistical Science, University College London, London, UK.

    • Catalina A Vallejos
  5. Division of Biostatistics, School of Public Health, University of California, Berkeley, Berkeley, California, USA.

    • Davide Risso
    •  & Sandrine Dudoit
  6. Department of Statistics, University of California, Berkeley, Berkeley, California, USA.

    • Sandrine Dudoit
  7. Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Cambridge, UK.

    • John C Marioni
  8. Wellcome Trust Sanger Institute, Wellcome Genome Campus, Cambridge, UK.

    • John C Marioni

Authors

  1. Search for Catalina A Vallejos in:

  2. Search for Davide Risso in:

  3. Search for Antonio Scialdone in:

  4. Search for Sandrine Dudoit in:

  5. Search for John C Marioni in:

Contributions

C.A.V., D.R., and A.S. performed analyses. C.A.V., D.R., A.S., S.D., and J.C.M. wrote the manuscript. S.D. and J.C.M. supervised the study.

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Sandrine Dudoit or John C Marioni.

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Data 1–3.

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nmeth.4292

Further reading