Single-cell RNA sequencing and high-dimensional cytometry can be used to generate detailed trajectories of dynamic biological processes such as differentiation or development. Here we present cellAlign, a quantitative framework for comparing expression dynamics within and between single-cell trajectories. By applying cellAlign to mouse and human embryonic developmental trajectories, we systematically delineate differences in the temporal regulation of gene expression programs that would otherwise be masked.
Subscribe to Journal
Get full journal access for 1 year
only $9.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Trapnell, C. et al. Nat. Biotechnol. 32, 381–386 (2014).
Shin, J. et al. Cell Stem Cell 17, 360–372 (2015).
Haghverdi, L., Büttner, M., Wolf, F.A., Buettner, F. & Theis, F.J. Nat. Methods 13, 845–848 (2016).
Bendall, S.C. et al. Cell 157, 714–725 (2014).
Llorens-Bobadilla, E. et al. Cell Stem Cell 17, 329–340 (2015).
Shalek, A.K. et al. Nature 510, 363–369 (2014).
Yan, L. et al. Nat. Struct. Mol. Biol. 20, 1131–1139 (2013).
Deng, Q. et al. Science 343, 193–196 (2014).
Bouniol, C., Nguyen, E. & Debey, P. Exp. Cell Res. 218, 57–62 (1995).
Braude, P., Bolton, V. & Moore, S. Nature 332, 459–461 (1988).
Giorgino, T. J. Stat. Softw. 31, 1–24 (2009).
Yuan, Y. et al. BMC Bioinformatics 12, 347 (2011).
Kamburov, A., Stelzl, U., Lehrach, H. & Herwig, R. Nucleic Acids Res. 41, 793–800 (2013).
This study was supported by the Israel Science Foundation (ISF) grants 1365/12 and the Applebaum Foundation. We thank Y. Ofran, M. Drukker, N. Kaplan, K. Brennand, and members of the Shen-Orr lab for fruitful discussions, and D. Alpert for assistance in algorithmic design and technical support.
S.S.S.-O. is a scientific adviser and holds equity in CytoReason. All other authors declare no competing interests.
Integrated supplementary information
Supplementary Figure 1 Rescaling of pseudo-time reduces technical impact of sampling density and trajectory building algorithm.
(a) Points distributed along a linear curve in the 200-dimensional space were sampled either uniformly or with bias to increase the density at the trajectory midpoint. Trajectories were built per sampling by Monocle2 algorithm and the resulting pseudo-time scores distributions are shown with color denoting magnitude of sampling bias. (b) Absolute pseudo-time shift elements (n=200 elements) resulting from alignment of the uniformly with the biased-sampled trajectories with (light blue) or without (pink) pseudo-time rescaling. (c) Absolute pseudo-time shift elements (n=200 elements) resulting from alignment of the uniformly sampled trajectory generated by different trajectory building algorithms with (light blue) or without (pink) pseudo-time rescaling. Per boxplot, the center, minimum, maximum, and range denote median, lowest value within 1.5 IQR of lower quartile, highest value within 1.5 IQR of upper quartile, and inter-quartile range, respectively. Abbreviations: DPT: diffusion pseudo time.
Supplementary Figure 2 Alignment-based distance calculated by cellAlign reflects known biological similarities.
Hierarchical clustering of the eight single cell RNA sequencing trajectories analyzed throughout the manuscript using alignment-based distances as calculated by cellAlign. Abbreviations: SVZ: sub-ventricular zone; DC: dendritic cells.
Supplementary Figure 3 Assembly of single-cell trajectories describing gene-expression dynamics following dendritic cells stimulation by LPS and PAM.
Diffusion maps algorithm applied to single cell RNA sequencing data of dendritic cells stimulated with either LPS or PAM. Dots represent single cells after either LPS (a, c) or PAM (b, d) stimulation. Colors denote either the actual collection time post-stimulation (a, b) or scaled pseudo-time scores (c, d) calculated by the diffusion pseudo-time algorithm. Abbreviations: DC: diffusion component.
Scaled imputed expression (z-scores) of genes included in the modules used for either global (a, b) or local (c, d) alignment under either LPS (a, c) or PAM (b, d) stimulation. Genes are ordered along the rows, while single cells are ordered by their pseudo-time scores on columns. Collection times and pseudo-time scores per single cell are displayed at the top.
Dissimilarity matrix and local alignment between expression dynamics of a single gene module along LPS (left to right) and PAM (top to bottom) trajectories under similarity threshold of 0.25 of the range of values in the dissimilarity matrix.
(a) The effect of increasing levels of noise in the form of additional dropout events on local alignment. Assessment was performed by calculating the fraction of the originally identified, locally-conserved region along the PAM trajectory that was still identified in the noised data. Error bars and centers denote standard deviations and mean values across n=500 simulations. (b) Performance of cellAlign for data processed by different preprocessing techniques, including: cells subsampling (purple), normalization method (turquoise), trajectory building algorithms (red) and application of data-imputation (green). Spearman correlation of alignment-based distances calculated across gene modules between original and the modified expression data are shown. For cells subsampling, the average and standard deviation across ten simulations are shown as error bars. (c) Spearman correlations between alignment based distances calculated across modules of co-expressed genes using expression data obtained either without batch correction or corrected by Combat or Limma.
Supplementary Figure 7 Validation and performance of cellAlign on non-imputed single-cell RNA sequencing data.
(a-b) Correlation between pseudo-time scores obtained from the normalized data (y-axes) versus those obtained from the imputed data (x-axes) for LPS (a) and PAM (b) trajectories. Scaled interpolated expression of the different 12 gene-modules calculated either using the normalized or imputed data are displayed beside and below the axes, respectively. Names of the different gene-modules, as appear in the original publication, are placed along a diagonal path to the left and below expression dynamics. Spearman correlation values between the imputed and non-imputed expression values appear on top. (c-f) Non-imputed, scaled expression (z-scores) of genes partitioned in the modules used for either global (c, d) or local (e, f) alignment under either LPS (c, e) or PAM (d, f) stimulations. Genes are ordered along the rows, while single cells are ordered by their pseudo-time scores on columns. Collection times and pseudo-time scores per single cell are displayed at the top. White matrix elements correspond to missing values (zeros). (g) Overall, non-imputed, scaled expression of a gene module by single cells (colors denote collection time) and interpolated points (black) along PAM (top) and LPS (bottom) trajectories. Values correspond to the 0.6 quantile scaled expression across the genes of the module. Dashed lines are point alignments from cellAlign global alignment. (h) Dissimilarity matrix and cellAlign global alignment of the non-imputed expression dynamics of one gene module along LPS (left to right) and PAM (top to bottom) trajectories. Histograms depict the distributions of collection time-points along either trajectory. Absolute values and signs of the pseudo-time shift elements, as derived from the alignment, are displayed as color bars at bottom. (i) Dissimilarity matrix and local alignment between non-imputed expression dynamics of a single gene module along LPS (left to right) and PAM (top to bottom) trajectories under similarity threshold of 0.25 of the range of values in the dissimilarity matrix. (j) Overall, non-imputed, expression of a gene-module whose expression pattern is conserved early following both LPS and PAM stimuli and deviates afterwards. Values correspond to the median interpolated scaled expression across module genes. Locally aligned points are red, whereas unaligned, non-conserved points in the trajectory are gray. (k) Distributions of the single-gene correlation difference between the aligned and un-aligned trajectories for global (pink, two-sided t-test p-value = 3*10−9, n = 67 genes) and local (top, two sided t-test p-value=0.027, n = 60 genes) alignment. Only those genes with less than overall 200 missing values in both LPS and PAM stimulated single cells participated in this analysis.
Supplementary Figure 8 Assembly of single-cell trajectories describing gene-expression dynamics during human and mouse preimplantation embryogenesis.
PCA applied on single cell RNA sequencing data using n=80 single cells from human origin (a,b) and n=280 single cells from mouse origin (c,d) preimplantation embryogenesis. Dots represent single cells. Colors denote either different developmental stages (a,c) or scaled pseudo-time scores (b,d) as calculated by Waterfall algorithm. Abbreviations: PC: principal component.
Supplementary Figure 9 Application of cellAlign on mouse and human preimplantation trajectories can be used to discriminate between genes with distinct dynamics rates in both species.
(a-b) Scaled expression (z-scores) of genes assigned to embryonic (a) and maternal (b) clusters across human developmental stages. (c) Scatterplot of the ordering of genes by their upregulation timing along human (x-axis) and mouse (y-axis) trajectories, showing high conservation of temporal regulation (n=288 genes, r = 0.18, linear regression p-value = 0.00173), yet marked differential regulation for a subset of genes. Color denotes point density.
Supplementary Figure 10 Global alignment of B-cell related proteins expression dynamics along B-cell lymphopoiesis in human and mouse mass cytometry data.
(a) A scheme of the comparison between expression dynamics during B-cell lymphopoiesis between human and mouse. (b) A dissimilarity matrix and global alignment of markers expression dynamics between human and mouse B-cell lymphopoiesis. (c, d) Expression dynamics of B-cell related markers along human (c) and mouse (d) B-cells lymphopoiesis. (e) Gating strategy for mouse B-cells mass-cytometry dataset.
Supplementary Figures 1–10 and Supplementary Notes 1–5 (PDF 2638 kb)
Functional enrichment of embryonic and maternal gene-clusters (XLSX 19 kb)
Antibodies used for mass cytometry experiment (XLSX 11 kb)
cellAlign package (ZIP 735 kb)
About this article
Cite this article
Alpert, A., Moore, L., Dubovik, T. et al. Alignment of single-cell trajectories to compare cellular expression dynamics. Nat Methods 15, 267–270 (2018). https://doi.org/10.1038/nmeth.4628
Genome Biology (2021)
Nature Biotechnology (2021)
Nature Communications (2020)
Nature Cell Biology (2020)
Nature Methods (2020)