Bayesian approach to single-cell differential expression analysis

Journal name:
Nature Methods
Year published:
Published online

Single-cell data provide a means to dissect the composition of complex tissues and specialized cellular environments. However, the analysis of such measurements is complicated by high levels of technical noise and intrinsic biological variability. We describe a probabilistic model of expression-magnitude distortions typical of single-cell RNA-sequencing measurements, which enables detection of differential expression signatures and identification of subpopulations of cells in a way that is more tolerant of noise.

At a glance


  1. Modeling single-cell RNA-seq measurement.
    Figure 1: Modeling single-cell RNA-seq measurement.

    (a) Smoothed scatter plot comparing gene-expression estimates from two MEFs, illustrating the types of cell-to-cell variability observed. RPM, reads per million. (b) Plots showing expression of Rnaseh2a and Bmp4, as examples of top differentially expressed genes, from CuffDiff2 (ref. 14) comparison of ten ES and ten MEF cells. Triangles show expression magnitudes observed in different cells, and whiskers span the range of observed expression magnitudes. (c) Plot showing a cross-comparison of single-cell measurements in cells of the same type, determining whether the transcript is likely to have been successfully amplified in both experiments (correlated component). (d) Plot showing read counts observed for a particular cell (y axis) relative to the expected expression magnitude (x axis; see c). The measurement is modeled as a mixture of dropout (red) and successful amplification processes (blue), with magnitude-dependent mixing of the two processes. (e,f) Probability of transcript-detection failures (dropout events) as a function of expression magnitude for individual ES and MEF cells2 (e) and for individual cells from 4-, 8- and 16-cell embryos12 (f).

  2. Applying single-cell models for differential expression and subpopulation analyses.
    Figure 2: Applying single-cell models for differential expression and subpopulation analyses.

    (a) Expression differences of Sox2 between all ES and MEF cells, measured by Islam et al.2. The plots show posterior probability (y axis, probability density) of expression magnitudes in mouse ES (mES, top) and MEF (bottom) cells. The model fitted for each single cell is used to estimate the likelihood that a gene is expressed at any particular level, given the observed data (red or blue curves). The black curve shows the estimated joint posterior distribution for the overall level for each cell type. The posterior probability of the fold-expression difference is shown in the middle plot with the associated raw P value (two-sided) of differential expression. (b) Expression differences of Dazl between cells of 8-cell and 16-cell mouse embryo stages12, as in a. A regulatory factor expressed in mammalian embryos19, 20, Dazl is expressed at earlier stages and shows a drop-off between 8- and 16-cell stages. (c) Receiver operating characteristic curves comparing the ability to detect differentially expressed genes, with bulk expression measurements as a benchmark17. SCA, single-cell assay15; AUC, area under curve. (d) Performance of error model–based transcriptional similarity measures in distinguishing ES and MEF cell types. The plot shows the fraction of correctly classified cells, assessed for increasingly difficult classification problems by iterative exclusion of up to 7,000 of the most informative genes (i.e., genes differentially expressed between ES and MEF, x axis). The 95% confidence bands (of the mean) are shown in light shading.


  1. Tang, F. et al. Nat. Methods 6, 377382 (2009).
  2. Islam, S. et al. Genome Res. 21, 11601167 (2011).
  3. Hashimshony, T., Wagner, F., Sher, N. & Yanai, I. Cell Reports 2, 666673 (2012).
  4. Ramsköld, D. et al. Nat. Biotechnol. 30, 777782 (2012).
  5. Dalerba, P. et al. Nat. Biotechnol. 29, 11201127 (2011).
  6. Tang, F. et al. PLoS ONE 6, e21208 (2011).
  7. Brouilette, S. et al. Dev. Dyn. 241, 15841590 (2012).
  8. Buganim, Y. et al. Cell 150, 12091222 (2012).
  9. Munsky, B., Neuert, G. & van Oudenaarden, A. Science 336, 183187 (2012).
  10. Brennecke, P. et al. Nat. Methods 10, 10931095 (2013).
  11. Wills, Q.F. et al. Nat. Biotechnol. 31, 748752 (2013).
  12. Deng, Q., Ramskold, D., Reinius, B. & Sandberg, R. Science 343, 193196 (2014).
  13. Anders, S. & Huber, W. Genome Biol. 11, R106 (2010).
  14. Trapnell, C. et al. Nat. Biotechnol. 31, 4653 (2013).
  15. McDavid, A. et al. Bioinformatics 29, 461467 (2013).
  16. Robinson, M.D. & Smyth, G.K. Bioinformatics 23, 28812887 (2007).
  17. Moliner, A., Enfors, P., Ibanez, C.F. & Andang, M. Stem Cells Dev. 17, 233243 (2008).
  18. Tischler, J. & Surani, M.A. Curr. Opin. Biotechnol. 24, 6978 (2013).
  19. Cauffman, G. et al. Mol. Hum. Reprod. 11, 405411 (2005).
  20. Pan, H.A. et al. Fertil. Steril. 89, 13241327 (2008).
  21. Trapnell, C., Pachter, L. & Salzberg, S.L. Bioinformatics 25, 11051111 (2009).
  22. Grün, B., Scharl, T. & Leisch, F. Bioinformatics 28, 222228 (2012).
  23. Andäng, M., Moliner, A., Doege, C.A., Ibanez, C.F. & Ernfors, P. Nat. Protoc. 3, 10131017 (2008).

Download references

Author information


  1. Center for Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA.

    • Peter V Kharchenko
  2. Hematology/Oncology Program, Children's Hospital, Boston, Massachusetts, USA.

    • Peter V Kharchenko
  3. Harvard Stem Cell Institute, Cambridge, Massachusetts, USA.

    • Peter V Kharchenko,
    • Lev Silberstein &
    • David T Scadden
  4. Center for Regenerative Medicine, Massachusetts General Hospital, Boston, Massachusetts, USA.

    • Lev Silberstein &
    • David T Scadden
  5. Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, Massachusetts, USA.

    • Lev Silberstein &
    • David T Scadden


P.V.K. conceived and implemented the computational approach. L.S. and D.T.S. designed and carried out the initial experimental study that led to the development of the presented approach.

Competing financial interests

D.T.S. is a shareholder in Fate Therapeutics and is a consultant for Fate Therapeutics, Hospira, GlaxoSmithKline and Bone Therapeutics. The remaining authors declare no competing financial interests.

Corresponding author

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (1486 KB)

    Supplementary Figures 1–5

Zip files

  1. Supplementary Software (410 KB)

    Software for single-cell differential analysis.

Additional data