Characterizing transcriptional heterogeneity through pathway and gene set overdispersion analysis

Journal name:
Nature Methods
Year published:
Published online

The transcriptional state of a cell reflects a variety of biological factors, from cell-type-specific features to transient processes such as the cell cycle, all of which may be of interest. However, identifying such aspects from noisy single-cell RNA-seq data remains challenging. We developed pathway and gene set overdispersion analysis (PAGODA) to resolve multiple, potentially overlapping aspects of transcriptional heterogeneity by testing gene sets for coordinated variability among measured cells.

At a glance


  1. Overview of PAGODA.
    Figure 1: Overview of PAGODA.

    Transcriptional heterogeneity is analyzed in seven steps. (1) Error models are fit for each cell19. A model fit for a cell is shown, separating drop-out and amplified components with the 95% confidence envelope (CE) of the amplified component. (2) The residual expression variance for each gene is determined relative to the transcriptome-wide expectation model (red curve), taking into account the uncertainty in the variance estimate for each gene by determining the effective degrees of freedom (kg) for the χ2 distribution. CV, coefficient of variation. (3) Weighted PCA is performed on annotated gene sets and on de novo gene sets determined on the basis of correlated expression in the current data set. (4) Cell PC scores of overdispersed gene sets (those with PC variance significantly higher than expected) are identified as significant aspects of heterogeneity. (5) Redundant aspects are grouped to provide a succinct overview of heterogeneity. (6) A web interface is used to navigate the identified aspects of heterogeneity, associated gene sets and gene expression patterns. (7) Aspects of heterogeneity deemed artifactual or extraneous with respect to the biological question can be controlled for in a subsequent iteration.

  2. PAGODA analysis of data from 3,005 mouse cortical and hippocampal cells.
    Figure 2: PAGODA analysis of data from 3,005 mouse cortical and hippocampal cells5.

    The dendrogram shows overall clustering and the first row indicates group assignments from the original analysis5. The rows below reflect the top nine significant aspects of heterogeneity (P < 0.05) detected by PAGODA on the basis of gene sets defined by GO annotations. Aspect scores (Cell PC score) are oriented so that high values generally correspond to increased expression of associated gene sets. Row labels summarize key functional annotations of gene sets in each aspect. Also shown are expression patterns of top-loading genes for innate immune response (from the aspect distinguishing neuroglia) and myelin sheath (distinguishing oligodendrocytes). A population of ~35 cells expressing both signatures is marked by a green bar and probably represents capture of two associated cells of different types. The images at the bottom show the microfluidic traps corresponding to some of the dual-signature cells, along with cells exhibiting only the oligodendrocyte signature (leftmost two images). Green numbered boxes below the uppermost panel highlight cells showing a combination of signatures of oligodendrocytes and other cell types (1–5 denote, respectively, vascular endothelial cells, astrocytes, CA1 neurons, Gad1/2 interneurons and neuroglia).

  3. Transcriptional heterogeneity of 65 NPCs in embryonic mouse cortex.
    Figure 3: Transcriptional heterogeneity of 65 NPCs in embryonic mouse cortex.

    (a) The top eight significant (P < 0.01) aspects of heterogeneity are shown, labeled by primary GO category or driving genes. The top aspect tracks the induction of neuronal maturation pathways, driving the overall subpopulation structure. Mitotic and S-phase signatures in early NPCs account for the next two most significant aspects, with the S-phase aspect incorporating closely matching expression patterns of genes responsible for NPC maintenance. The top panel summarizes key subpopulations of NPCs distinguished by the detected heterogeneity aspects. (b) Location of early versus maturing NPC classes within embryonic brain. In situ hybridizations in E13.5 mouse brain are shown for Tyro3 and Nfasc, with the two heat map rows at the top showing scRNA-seq expression. Computational prediction (rightmost panels in image rows) based on the overall transcriptional profile placed early NPCs near the VZ and maturing ones in the SVZ and cortical plate (CP) regions. In situ images were generated by the Allen Institute for Brain Science23. The bottom row of images shows the anatomical placement of the Dlx-expressing NPCs and in situ images for the associated genes. GE, ganglionic eminence.

Accession codes

Primary accessions

Gene Expression Omnibus

Referenced accessions

Gene Expression Omnibus


  1. Islam, S. et al. Nat. Methods 11, 163166 (2014).
  2. Picelli, S. et al. Nat. Methods 10, 10961098 (2013).
  3. Tang, F. et al. PLoS ONE 6, e21208 (2011).
  4. Usoskin, D. et al. Nat. Neurosci. 18, 145153 (2015).
  5. Zeisel, A. et al. Science 347, 11381142 (2015).
  6. Buettner, F. et al. Nat. Biotechnol. 33, 155160 (2015).
  7. Macosko, E.Z. et al. Cell 161, 12021214 (2015).
  8. Klein, A.M. et al. Cell 161, 11871201 (2015).
  9. Patel, A.P. et al. Science 344, 13961401 (2014).
  10. Grün, D., Kester, L. & van Oudenaarden, A. Nat. Methods 11, 637640 (2014).
  11. Buettner, F. & Theis, F.J. Bioinformatics 28, i626i632 (2012).
  12. van der Maaten, L.J.P. & Hinton, G.E. J. Mach. Learn. Res. 9, 25792605 (2008).
  13. Jaitin, D.A. et al. Science 343, 776779 (2014).
  14. Subramanian, A., Kuehn, H., Gould, J., Tamayo, P. & Mesirov, J.P. Bioinformatics 23, 32513253 (2007).
  15. Blaschke, A.J., Staley, K. & Chun, J. Development 122, 11651174 (1996).
  16. Rehen, S.K. et al. Proc. Natl. Acad. Sci. USA 98, 1336113366 (2001).
  17. Peterson, S.E. et al. J. Neurosci. 32, 1621316222 (2012).
  18. Herr, K.J., Herr, D.R., Lee, C.W., Noguchi, K. & Chun, J. Proc. Natl. Acad. Sci. USA 108, 1544415449 (2011).
  19. Kharchenko, P.V., Silberstein, L. & Scadden, D.T. Nat. Methods 11, 740742 (2014).
  20. Pollen, A.A. et al. Nat. Biotechnol. 32, 10531058 (2014).
  21. Kawaguchi, A. et al. Development 135, 31133124 (2008).
  22. Kriegstein, A., Noctor, S. & Martinez-Cerdeno, V. Nat. Rev. Neurosci. 7, 883890 (2006).
  23. Lein, E.S. et al. Nature 445, 168176 (2007).
  24. Englund, C. et al. J. Neurosci. 25, 247251 (2005).
  25. Uetsuki, T., Takagi, K., Sugiura, H. & Yoshikawa, K. J. Biol. Chem. 271, 918924 (1996).
  26. Minamide, R., Fujiwara, K., Hasegawa, K. & Yoshikawa, K. PLoS ONE 9, e84460 (2014).
  27. Huang, Z., Fujiwara, K., Minamide, R., Hasegawa, K. & Yoshikawa, K. J. Neurosci. 33, 1036210373 (2013).
  28. Anderson, S.A., Eisenstat, D.D., Shi, L. & Rubenstein, J.L. Science 278, 474476 (1997).
  29. Wonders, C.P. & Anderson, S.A. Nat. Rev. Neurosci. 7, 687696 (2006).
  30. Ma, T. et al. Cereb. Cortex 22, 21202130 (2012).
  31. Anders, S. & Huber, W. Genome Biol. 11, R106 (2010).
  32. Fisher, R.A. Statistical Methods for Research Workers (Hafner, 1970).
  33. Abdel, H.E. Encyclopedia of Environmetrics 2nd edn (Wiley, 2012).
  34. Hasings, C., Mosteller, F., Tukey, J.W. & Winsor, C.P. Ann. Math. Stat. 18, 413426 (1974).
  35. Bailey, S. Publ. Astron. Soc. Pac. 124, 1023 (2012).
  36. Johnstone, I.M. Ann. Stat. 29, 295327 (2001).
  37. Benjamini, Y. & Hochberg, Y. J. R. Stat. Soc. Series B Stat. Methodol. 57, 289300 (1995).
  38. Satija, R., Farrell, J.A., Gennert, D., Schier, A.F. & Regev, A. Nat. Biotechnol. 33, 495502 (2015).
  39. Achim, K. et al. Nat. Biotechnol. 33, 503509 (2015).

Download references

Author information


  1. Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA.

    • Jean Fan,
    • Joseph L Herman &
    • Peter V Kharchenko
  2. Illumina Inc., San Diego, California, USA.

    • Neeraj Salathia,
    • Fiona Kaper &
    • Jian-Bing Fan
  3. Department of Bioengineering, University of California, San Diego, California, USA.

    • Rui Liu &
    • Kun Zhang
  4. Department of Molecular and Cellular Neuroscience, Dorris Neuroscience Center, The Scripps Research Institute, La Jolla, California, USA.

    • Gwendolyn E Kaeser,
    • Yun C Yung &
    • Jerold Chun
  5. Harvard Stem Cell Institute, Cambridge, Massachusetts, USA.

    • Peter V Kharchenko
  6. Present address: AnchorDx Corporation, International Biotech Island, Guangzhou, Guangdong, China.

    • Jian-Bing Fan


K.Z., J.C. and P.V.K. conceived the study. N.S., R.L., G.E.K., Y.C.Y., F.K. and J.-B.F. carried out the single-cell purification and RNA-seq measurements. G.E.K. and J.C. carried out RNAscope in situ validation. J.F. and P.V.K. designed and implemented the statistical analysis approach, with the help of J.L.H. P.V.K. and J.F. wrote the manuscript with the help of J.C. and K.Z.

Competing financial interests

N.S. and F.K. are a current employees and shareholders of Illumina, Inc.

Corresponding author

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (9,579 KB)

    Supplementary Figures 1–5 and Supplementary Notes 1–3

Zip files

  1. Supplementary Software (1,906 KB)

    Source code: SCDE R Package

Additional data