Analysis | Published:

Bias, robustness and scalability in single-cell differential expression analysis

Nature Methods volume 15, pages 255261 (2018) | Download Citation

Abstract

Many methods have been used to determine differential gene expression from single-cell RNA (scRNA)-seq data. We evaluated 36 approaches using experimental and synthetic data and found considerable differences in the number and characteristics of the genes that are called differentially expressed. Prefiltering of lowly expressed genes has important effects, particularly for some of the methods developed for bulk RNA-seq data analysis. However, we found that bulk RNA-seq analysis methods do not generally perform worse than those developed specifically for scRNA-seq. We also present conquer, a repository of consistently processed, analysis-ready public scRNA-seq data sets that is aimed at simplifying method evaluation and reanalysis of published results. Each data set provides abundance estimates for both genes and transcripts, as well as quality control and exploratory analysis reports.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from $8.99

All prices are NET prices.

References

  1. 1.

    et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nat. Methods 6, 377–382 (2009).

  2. 2.

    et al. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat. Methods 10, 1096–1098 (2013).

  3. 3.

    et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161, 1187–1201 (2015).

  4. 4.

    et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015).

  5. 5.

    et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).

  6. 6.

    , & Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).

  7. 7.

    , & edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).

  8. 8.

    , , & voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol. 15, R29 (2014).

  9. 9.

    & Differential expression analyses for single-cell RNA-Seq: old questions on new data. Quant. Biol. 4, 243–260 (2016).

  10. 10.

    , , & Comparison of methods to detect differentially expressed genes between single-cell populations. Brief. Bioinform. 18, 735–743 (2017).

  11. 11.

    & Overcoming confounding plate effects in differential expression analyses of single-cell RNA-seq data. Biostatistics 18, 451–464 (2017).

  12. 12.

    , & Beyond comparisons of means: understanding changes in gene expression at the single-cell level. Genome Biol. 17, 70 (2016).

  13. 13.

    et al. A statistical approach for identifying differential distributions in single-cell RNA-seq experiments. Genome Biol. 17, 222 (2016).

  14. 14.

    , , , & Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33, 495–502 (2015).

  15. 15.

    , & It's DE-licious: a recipe for differential expression analyses of RNA-seq experiments using quasi-likelihood methods in edger. in Statistical Genomics (eds. Mathé, E. & Davis, S.) 391–416 (Springer New York, 2016).

  16. 16.

    , , & Differential abundance analysis for microbial marker-gene surveys. Nat. Methods 10, 1200–1202 (2013).

  17. 17.

    , & Independent filtering increases detection power for high-throughput experiments. Proc. Natl. Acad. Sci. USA 107, 9546–9551 (2010).

  18. 18.

    , , & Data-driven hypothesis weighting increases detection power in genome-scale multiple testing. Nat. Methods 13, 577–580 (2016).

  19. 19.

    et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).

  20. 20.

    , , & Reproducibility-optimized test statistic for ranking genes in microarray studies. IEEE/ACM Trans. Comput. Biol. Bioinform. 5, 423–431 (2008).

  21. 21.

    , , & ROTS: reproducible RNA-seq biomarker detector-prognostic markers for clear cell renal cell cancer. Nucleic Acids Res. 44, e1 (2016).

  22. 22.

    , & Bayesian approach to single-cell differential expression analysis. Nat. Methods 11, 740–742 (2014).

  23. 23.

    et al. Single-cell mRNA quantification and differential analysis with Census. Nat. Methods 14, 309–315 (2017).

  24. 24.

    et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 32, 381–386 (2014).

  25. 25.

    , , , & Fast, scalable and accurate differential expression analysis for single cells. Preprint available at (2016).

  26. 26.

    & Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data. Stat. Methods Med. Res. 22, 519–536 (2013).

  27. 27.

    et al. MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol. 16, 278 (2015).

  28. 28.

    & Discrete distributional differential expression (D3E)—a tool for gene expression analysis of single-cell RNA-seq data. BMC Bioinformatics 17, 110 (2016).

  29. 29.

    Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 3, e3 (2004).

  30. 30.

    & DEsingle: a new method for single-cell differentially expressed genes detection and classification. Preprint available at (2017).

  31. 31.

    et al. Beta-Poisson model for single-cell RNA-seq data analyses. Bioinformatics 32, 2128–2135 (2016).

  32. 32.

    et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nat. Methods 12, 115–121 (2015).

  33. 33.

    et al. The Ensembl gene annotation system. Database 2016, baw093 (2016).

  34. 34.

    , , , & powsimR: power analysis for bulk and single cell RNA-seq experiments. Preprint available at (2017).

  35. 35.

    & Towards unified quality verification of synthetic count data with countsimQC. Bioinformatics (2017).

  36. 36.

    , & Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Res. 4, 1521 (2015).

  37. 37.

    et al. Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing. Nat. Neurosci. 18, 145–153 (2015).

  38. 38.

    , & Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Res. 40, 4288–4297 (2012).

  39. 39.

    , & Differential Expression Analysis of Complex RNA-seq Experiments Using edgeR. in Statistical Analysis of Next Generation Sequencing Data (eds. Datta, S. & Nettleton, D.) 51–74 (Springer International Publishing, 2014).

  40. 40.

    , & Robustly detecting differential expression in RNA sequencing data using observation weights. Nucleic Acids Res. 42, e91 (2014).

  41. 41.

    & A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11, R25 (2010).

  42. 42.

    , & Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biol. 17, 75 (2016).

  43. 43.

    Individual comparisons by ranking methods. Biom. Bull. 1, 80–83 (1945).

  44. 44.

    et al. Data exploration, quality control and testing in single-cell qPCR-based gene expression experiments. Bioinformatics 29, 461–467 (2013).

  45. 45.

    The generalisation of student's problems when several different population variances are involved. Biometrika 34, 28–35 (1947).

  46. 46.

    et al. Multiple-laboratory comparison of microarray platforms. Nat. Methods 2, 345–350 (2005).

  47. 47.

    , & Moore's law in single cell transcriptomics. Preprint available at (2017).

  48. 48.

    , , , & Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14, 417–419 (2017).

  49. 49.

    , , & MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32, 3047–3048 (2016).

  50. 50.

    R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2016).

  51. 51.

    & iCOBRA: open, reproducible, standardized and live method benchmarking. Nat. Methods 13, 283 (2016).

  52. 52.

    ggplot2: Elegant Graphics for Data Analysis (Springer-Verlag New York, 2009).

Download references

Acknowledgements

The authors acknowledge M. Love and V. Svensson for helpful online instructions regarding automated download of raw data from ENA. This study was supported by the Forschungskredit of the University of Zurich, grant no. FK-16-107 to C.S.

Author information

Affiliations

  1. Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland.

    • Charlotte Soneson
    •  & Mark D Robinson
  2. SIB Swiss Institute of Bioinformatics, Zurich, Switzerland.

    • Charlotte Soneson
    •  & Mark D Robinson

Authors

  1. Search for Charlotte Soneson in:

  2. Search for Mark D Robinson in:

Contributions

C.S. and M.D.R. designed analyses and wrote the manuscript. C.S. performed analyses. Both authors read and approved the final manuscript.

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Charlotte Soneson or Mark D Robinson.

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–32 and Supplementary Tables 1–3

  2. 2.

    Life Sciences Reporting Summary

Zip files

  1. 1.

    Supplementary Data

    countsimQC reports, illustrating the similarity between each simulated dataset and the respective underlying real data set

  2. 2.

    Supplementary Software

    Snapshot (at time of publication) of the two GitHub repositories containing the code used to build the conquer database (https://github.com/markrobinsonuzh/conquer) and to perform the method comparison (https://github.com/csoneson/conquer_comparison)

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nmeth.4612