Diffusion pseudotime robustly reconstructs lineage branching

Journal name:
Nature Methods
Volume:
13,
Pages:
845–848
Year published:
DOI:
doi:10.1038/nmeth.3971
Received
Accepted
Published online

The temporal order of differentiating cells is intrinsically encoded in their single-cell expression profiles. We describe an efficient way to robustly estimate this order according to diffusion pseudotime (DPT), which measures transitions between cells using diffusion-like random walks. Our DPT software implementations make it possible to reconstruct the developmental progression of cells and identify transient or metastable states, branching decisions and differentiation endpoints.

At a glance

Figures

  1. Diffusion pseudotime reveals temporal ordering and cellular decisions on the single cell level.
    Figure 1: Diffusion pseudotime reveals temporal ordering and cellular decisions on the single cell level.

    (a) The diffusion transition matrix Txy is constructed by computing the overlap of local kernels at the expression levels of cells x and y (1). Diffusion pseudotime dpt(x,y) approximates the geodesic distance between x and y on the mapped manifold (2). Branching points are identified as points where anticorrelated distances from branch ends become correlated (3). (b) Application of DPT to single-cell qPCR of 42 genes in 3,934 single cells during early hematopoiesis13, sorted from primitive streak (PS), neural plate (NP), head fold (HF), four somite GFP negative (4SG−) and four somite GFP positive (4SG+). DPT identifies the endothelial branch 1 (4SG) and the erythroid branch 2 (4SG+) (blue cells in bottom graphs). (c) Dynamics of genes Erg and Ikaros in both branches. Black lines show the moving average over 50 adjacent cells. The red vertical line depicts the branching point. (d) Heatmap of gene expression (smoothed over 50 adjacent cells), with cells ordered by DPT and branching and genes ordered according to first major change (see Supplementary Note 2, section 2), which is indicated by black triangles (upward: activation, downward: deactivation). Pie charts (bottom) show the fraction of cells in the four metastable states (metastable state populations are high-density DPT regions indicated by the black horizontal line above the pie charts).

  2. Diffusion pseudotime identifies differentiation dynamics from scRNA-seq data.
    Figure 2: Diffusion pseudotime identifies differentiation dynamics from scRNA-seq data9.

    (a) Mouse ESCs after LIF withdrawal were harvested at T = 0, 2, 4 and 7 d and profiled with the inDrop protocol, yielding 2,717 cells with 24,175 observed unique transcripts9. Visualization using diffusion maps shows temporal dynamics across the 7 d. (b) Pseudotemporal dynamics of the expression of selected genes. Compare with Figure 7b of Klein et al.9. (c) Heatmap of gene expression, with cells ordered by DPT and branches (separated by white vertical bars) and genes ordered according to hierarchical clustering. The heatmap depicts gene dynamics after hierarchical clustering (see Supplementary Fig. 4b); clusters indicated by color bar on right. (d) Gene ontology enrichment shows a cellular reorganization signature (orange), a metabolic signature for differentiation (purple) and a cell motility signature (yellow). (e) Comparisons of robustness of DPT, Wanderlust/Wishbone (Wand./Wish.) and Monocle by self-concordance measure on bootstrap samples for several data sets (Supplementary Note 5). DPT shows higher robustness (self concordance) across all data sets (all two-sided t-test significance levels P < 0.001, except for the nonsignificant (n.s.) difference from Wishbone in the artificial data). (f) Kendal rank correlation of pseudotime with experimental days. DPT orders cells (see Online Methods) significantly better than pseudotemporal ordering by Wanderlust/Wishbone (Kendal rank correlation 0.77 ± 10−3 versus 0.70 ± 10−3). (e,f) Boxes indicate median and first and third quartile, whiskers extend to ±1.5× the interquartile ratio divided by the square root of the number of observations, and single points denote measurements outside this range.

  3. metastable states of mouse early blood development qPCR data
    Supplementary Fig. 1: metastable states of mouse early blood development qPCR data

    a) Diffusion map plot illustrating four metastable states along pseudotemporal ordering. Lower right: Precursor state. Left: Tip branch 1. Upper right: Decision state (light gray) and tip branch 2 (dark gray). b) Histogram plot of the cell density along the branches. Blue bars: branch 1, black bars: branch 2. Both branches share the precursor branch up to the decision state (gray bars).

  4. Differential expression analysis using MAST on mESC inDrop data
    Supplementary Fig. 2: Differential expression analysis using MAST on mESC inDrop data

    Log-fold change (lfc) analysis of the DPT inferred ‘decision’ group vs. all other groups (a,c,e) and head fold cells vs. primitive streak and 4SG- cells (d, e). The displayed genes were filtered for an lfc > 1 and a Bonferroni-adjusted p-value< 0.01. Plots are ordered by absolute lfc between the states. a) Decision area (red) vs. Precursor area (blue), b) Head fold (red) vs. Primitive streak (blue), c) Decision area (red) vs. branch 2 end point (blue), d) Head fold (red) vs. 4SG negative cells (blue), e) Decision area (red) vs. branch 1 end point (blue).

  5. Influence of cell-cycle correction on data clustering and GO enrichment
    Supplementary Fig. 3: Influence of cell-cycle correction on data clustering and GO enrichment

    a,b) The total count of transcripts from 2047 heterogeneous genes per day. a) log-normalized counts before cell-cycle correction. b) log-normalized counts after cell-cycle correction. c) Fit the CV2-mean relation according to Brennecke et al [11] to a pure RNA control and d) superimpose these technical genes with endogenous genes. e) Variance decomposition according to the identified latent variables. f) Detailed variance decomposition sorted by technical noise contribution.

  6. Expression profiles of highly variable genes before and after cell-cycle correction and pseudotime ordering of mESC inDrop data
    Supplementary Fig. 4: Expression profiles of highly variable genes before and after cell-cycle correction and pseudotime ordering of mESC inDrop data

    Heatmap displaying the expression profiles of 2047 highly variable genes before a) and after cell-cycle correction and pseudotime ordering (b,c), time courses of gene expression along batch (d) and pseudotime (e,f), GO enrichment analysis of the clusters in (a,c). The colored top bar (a-c) indicates the time after LIF withdrawal (dark blue: day 0, light blue: day 2, yellow: day 4, red: day 7). a) Gene expression with strong day-to-day variability. b) Cell-cycle corrected gene expression and additional quantile normalization. c) Cell-cycle corrected gene expression and additional Z-score normalization. Pseudotemporal ordering is indicated by mixed colors in the top annotation bar. In the time courses, the respective genes are indicated in grey, the black curve is the smoothed mean. d) log-transformed gene expression counts. e) Cell cycle correction, log transformed gene expression counts, quantile normalization (cf. Fig. 2d in main text). f) As in E), with Z-score normalization. All clusters share the same temporal behavior. The green cluster GO terms are not shown. For each cluster, five representative GO terms are displayed. g) GO terms before cell-cycle correction, h) after cell-cycle correction and Z-score normalization. i) Distribution of cells along pseudotime labeled by time after LIF withdrawal.

  7. p-values of Wilcoxon rank sum test applied to the first population that branches off the main branch in mESC inDrop data
    Supplementary Fig. 5: p-values of Wilcoxon rank sum test applied to the first population that branches off the main branch in mESC inDrop data

    Shown are the 20 apoptosis-related genes (GO:0006915) among the 108 genes identified by Wilcoxon rank sum test. The test compares cells from the early state population (see text) with cells from the first population that branches off the main branch.

Accession codes

Primary accessions

Gene Expression Omnibus

References

  1. Shalek, A.K. et al. Nature 498, 236240 (2013).
  2. Moignard, V. et al. Nat. Cell Biol. 15, 363372 (2013).
  3. Treutlein, B. et al. Nature 509, 371375 (2014).
  4. Magwene, P.M., Lizardi, P. & Kim, J. Bioinformatics 19, 842850 (2003).
  5. Trapnell, C. et al. Nat. Biotechnol. 32, 381386 (2014).
  6. Bendall, S.C. et al. Cell 157, 714725 (2014).
  7. Setty, M. et al. Nat. Biotechnol. 34, 637645 (2016).
  8. Macosko, E.Z. et al. Cell 161, 12021214 (2015).
  9. Klein, A.M. et al. Cell 161, 11871201 (2015).
  10. Paul, F. et al. Cell 163, 16631677 (2015).
  11. Coifman, R.R. et al. Proc. Natl. Acad. Sci. USA 102, 74267431 (2005).
  12. Haghverdi, L., Buettner, F. & Theis, F.J. Bioinformatics 31, 29892998 (2015).
  13. Moignard, V. et al. Nat. Biotechnol. 33, 269276 (2015).
  14. Huber, T.L., Kouskoff, V., Fehling, H.J., Palis, J. & Keller, G. Nature 432, 625630 (2004).
  15. Costa, G., Kouskoff, V. & Lacaud, G. Trends Immunol. 33, 215223 (2012).
  16. Finak, G. et al. Genome Biol. 16, 278 (2015).
  17. Gut, G., Tadmor, M.D., Pe'er, D., Pelkmans, L. & Liberali, P. Nat. Methods 12, 951954 (2015).
  18. Angerer, P. et al. Bioinformatics 32, 12411243 (2016).
  19. von Luxburg, U. Stat. Comput. 17, 395416 (2007).
  20. Buettner, F. et al. Nat. Biotechnol. 33, 155160 (2015).

Download references

Author information

Affiliations

  1. Helmholtz Zentrum München, German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany.

    • Laleh Haghverdi,
    • Maren Büttner,
    • F Alexander Wolf,
    • Florian Buettner &
    • Fabian J Theis
  2. Department of Mathematics, Technische Universität München, Munich, Germany.

    • Laleh Haghverdi &
    • Fabian J Theis
  3. European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    • Florian Buettner

Contributions

L.H. developed the method and the computational tools, performed the analysis and wrote the paper and the supplement. M.B. contributed to the analysis and biological interpretation of results and wrote the supplement. F.A.W. helped interpret the results and write the supplement, and he wrote the paper. F.B. helped interpret the results. F.J.T. conceived and supervised the study, contributed to the method development and wrote the paper with help from all coauthors.

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

Author details

Supplementary information

Supplementary Figures

  1. Supplementary Figure 1: metastable states of mouse early blood development qPCR data (50 KB)

    a) Diffusion map plot illustrating four metastable states along pseudotemporal ordering. Lower right: Precursor state. Left: Tip branch 1. Upper right: Decision state (light gray) and tip branch 2 (dark gray). b) Histogram plot of the cell density along the branches. Blue bars: branch 1, black bars: branch 2. Both branches share the precursor branch up to the decision state (gray bars).

  2. Supplementary Figure 2: Differential expression analysis using MAST on mESC inDrop data (239 KB)

    Log-fold change (lfc) analysis of the DPT inferred ‘decision’ group vs. all other groups (a,c,e) and head fold cells vs. primitive streak and 4SG- cells (d, e). The displayed genes were filtered for an lfc > 1 and a Bonferroni-adjusted p-value< 0.01. Plots are ordered by absolute lfc between the states. a) Decision area (red) vs. Precursor area (blue), b) Head fold (red) vs. Primitive streak (blue), c) Decision area (red) vs. branch 2 end point (blue), d) Head fold (red) vs. 4SG negative cells (blue), e) Decision area (red) vs. branch 1 end point (blue).

  3. Supplementary Figure 3: Influence of cell-cycle correction on data clustering and GO enrichment (85 KB)

    a,b) The total count of transcripts from 2047 heterogeneous genes per day. a) log-normalized counts before cell-cycle correction. b) log-normalized counts after cell-cycle correction. c) Fit the CV2-mean relation according to Brennecke et al [11] to a pure RNA control and d) superimpose these technical genes with endogenous genes. e) Variance decomposition according to the identified latent variables. f) Detailed variance decomposition sorted by technical noise contribution.

  4. Supplementary Figure 4: Expression profiles of highly variable genes before and after cell-cycle correction and pseudotime ordering of mESC inDrop data (201 KB)

    Heatmap displaying the expression profiles of 2047 highly variable genes before a) and after cell-cycle correction and pseudotime ordering (b,c), time courses of gene expression along batch (d) and pseudotime (e,f), GO enrichment analysis of the clusters in (a,c). The colored top bar (a-c) indicates the time after LIF withdrawal (dark blue: day 0, light blue: day 2, yellow: day 4, red: day 7). a) Gene expression with strong day-to-day variability. b) Cell-cycle corrected gene expression and additional quantile normalization. c) Cell-cycle corrected gene expression and additional Z-score normalization. Pseudotemporal ordering is indicated by mixed colors in the top annotation bar. In the time courses, the respective genes are indicated in grey, the black curve is the smoothed mean. d) log-transformed gene expression counts. e) Cell cycle correction, log transformed gene expression counts, quantile normalization (cf. Fig. 2d in main text). f) As in E), with Z-score normalization. All clusters share the same temporal behavior. The green cluster GO terms are not shown. For each cluster, five representative GO terms are displayed. g) GO terms before cell-cycle correction, h) after cell-cycle correction and Z-score normalization. i) Distribution of cells along pseudotime labeled by time after LIF withdrawal.

  5. Supplementary Figure 5: p-values of Wilcoxon rank sum test applied to the first population that branches off the main branch in mESC inDrop data (53 KB)

    Shown are the 20 apoptosis-related genes (GO:0006915) among the 108 genes identified by Wilcoxon rank sum test. The test compares cells from the early state population (see text) with cells from the first population that branches off the main branch.

PDF files

  1. Supplementary Text and Figures (7,058 KB)

    Supplementary Figures 1–5, Supplementary Notes 1–7, and Supplementary Tables 1 and 2

Zip files

  1. Supplementary Data (234,537 KB)

    Processed inDrop data set

  2. Supplementary Software (3,963 KB)

    DPT Software in Matlab and R

Additional data