Abstract
Single-cell RNA sequencing has been widely used to investigate cell state transitions and gene dynamics of biological processes. Current strategies to infer the sequential dynamics of genes in a process typically rely on constructing cell pseudotime through cell trajectory inference. However, the presence of concurrent gene processes in the same group of cells and technical noise can obscure the true progression of the processes studied. To address this challenge, we present GeneTrajectory, an approach that identifies trajectories of genes rather than trajectories of cells. Specifically, optimal transport distances are calculated between gene distributions across the cell–cell graph to extract gene programs and define their gene pseudotemporal order. Here we demonstrate that GeneTrajectory accurately extracts progressive gene dynamics in myeloid lineage maturation. Moreover, we show that GeneTrajectory deconvolves key gene programs underlying mouse skin hair follicle dermal condensate differentiation that could not be resolved by cell trajectory approaches. GeneTrajectory facilitates the discovery of gene programs that control the changes and activities of biological processes.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The human PBMC scRNA-seq dataset is available at https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.0/pbmc_10k_v3. The mouse embryonic skin dataset generated and analyzed in this study is available from the Gene Expression Omnibus with the accession GSE255534. The processed Seurat data objects for these two datasets are available at Figshare (https://doi.org/10.6084/m9.figshare.25243225). The Cyclebase gene list was extracted from Supplementary Table 5 in ref. 60.
Code availability
The R package of GeneTrajectory and the code used for data analysis are available on GitHub (https://github.com/KlugerLab/GeneTrajectory).
References
Mahdessian, D. et al. Spatiotemporal dissection of the cell cycle with single-cell proteogenomics. Nature 590, 649–654 (2021).
Scialdone, A. et al. Computational assignment of cell-cycle stage from single-cell transcriptome data. Methods 85, 54–61 (2015).
Skinner, S. O. et al. Single-cell analysis of transcription kinetics across the cell cycle. eLife 5, e12175 (2016).
Cao, J., Zhou, W., Steemers, F., Trapnell, C. & Shendure, J. Sci-fate characterizes the dynamics of gene expression in single cells. Nat. Biotechnol. 38, 980–988 (2020).
Qu, R. et al. Decomposing a deterministic path to mesenchymal niche formation by two intersecting morphogen gradients. Dev. Cell 57, 1053–1067 (2022).
Macaulay, I. C. et al. Single-cell RNA-sequencing reveals a continuous spectrum of differentiation in hematopoietic cells. Cell Rep. 14, 966–977 (2016).
Chu, L.-F. et al. Single-cell RNA-seq reveals novel regulators of human embryonic stem cell differentiation to definitive endoderm. Genome Biol. 17, 173 (2016).
Chen, R., Wu, X., Jiang, L. & Zhang, Y. Single-cell RNA-seq reveals hypothalamic cell diversity. Cell Rep. 18, 3227–3241 (2017).
Street, K. et al. Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genomics 19, 477 (2018).
Cao, J. et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature 566, 496–502 (2019).
Wolf, F. A. et al. PAGA: graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells. Genome Biol. 20, 59 (2019).
Van den Berge, K. et al. Trajectory-based differential expression analysis for single-cell sequencing data. Nat. Commun. 11, 1201 (2020).
Deconinck, L., Cannoodt, R., Saelens, W., Deplancke, B. & Saeys, Y. Recent advances in trajectory inference from single-cell omics data. Curr. Opin. Syst. Biol. 27, 100344 (2021).
Saelens, W., Cannoodt, R., Todorov, H. & Saeys, Y. A comparison of single-cell trajectory inference methods. Nat. Biotechnol. 37, 547–554 (2019).
Lange, M. et al. CellRank for directed single-cell fate mapping. Nat. Methods 19, 159–170 (2022).
Qiu, X. et al. Reversed graph embedding resolves complex single-cell trajectories. Nat. Methods 14, 979–982 (2017).
Haghverdi, L., Büttner, M., Wolf, F. A., Buettner, F. & Theis, F. J. Diffusion pseudotime robustly reconstructs lineage branching. Nat. Methods 13, 845–848 (2016).
Setty, M. et al. Characterization of cell fate probabilities in single-cell data with Palantir. Nat. Biotechnol. 37, 451–460 (2019).
Lönnberg, T. et al. Single-cell RNA-seq and computational analysis using temporal mixture modeling resolves Th1/Tfh fate bifurcation in malaria. Sci. Immunol. 2, eaal2192 (2017).
Tritschler, S. et al. Concepts and limitations for learning developmental trajectories from single cell genomics. Development 146, dev170506 (2019).
Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 32, 381–386 (2014).
Ruijtenberg, S. & van den Heuvel, S. Coordinating cell proliferation and differentiation: antagonism between cell cycle regulators and cell type-specific gene expression. Cell Cycle 15, 196–212 (2016).
Rougny, A., Paulevé, L., Teboul, M. & Delaunay, F. A detailed map of coupled circadian clock and cell cycle with qualitative dynamics validation. BMC Bioinformatics 22, 240 (2021).
Gupta, K. et al. Single-cell analysis reveals a hair follicle dermal niche molecular differentiation trajectory that begins prior to morphogenesis. Dev. Cell 48, 17–31 (2019).
Sood, P. et al. Modular, cascade-like transcriptional program of regeneration in stentor. eLife 11, e80778 (2022).
Zhu, H., Zhao, S. D., Ray, A., Zhang, Y. & Li, X. A comprehensive temporal patterning gene network in Drosophila medulla neuroblasts revealed by single-cell RNA sequencing. Nat. Commun. 13, 1247 (2022).
Li, J. et al. Systematic reconstruction of molecular cascades regulating GP development using single-cell RNA-seq. Cell Rep. 15, 1467–1480 (2016).
Huizing, G.-J., Peyré, G. & Cantini, L. Optimal transport improves cell–cell similarity inference in single-cell omics data. Bioinformatics 38, 2169–2177 (2022).
Bellazzi, R., Codegoni, A., Gualandi, S., Nicora, G. & Vercesi, E. The gene mover’s distance: single-cell similarity via optimal transport. Preprint at arXiv 10.48550/arXiv.2102.01218 (2021).
Orlova, D. Y. et al. Earth mover’s distance (EMD): a true metric for comparing biomarker expression levels in cell populations. PLoS ONE 11, e0151859 (2016).
Schiebinger, G. et al. Optimal-transport analysis of single-cell gene expression identifies developmental trajectories in reprogramming. Cell 176, 928–943 (2019).
Zhang, S., Afanassiev, A., Greenstreet, L., Matsumoto, T. & Schiebinger, G. Optimal transport analysis reveals trajectories in steady-state systems. PLoS Comput. Biol. 17, e1009466 (2021).
Cang, Z. & Nie, Q. Inferring spatial and signaling relationships between cells from single cell transcriptomic data. Nat. Commun. 11, 2084 (2020).
Moriel, N. et al. NovoSpaRc: flexible spatial reconstruction of single-cell gene expression with optimal transport. Nat. Protoc. 16, 4177–4200 (2021).
Demetci, P., Santorella, R., Sandstede, B., Noble, W. S. & Singh, R. SCOT: single-cell multi-omics alignment with optimal transport. J. Comput. Biol. 29, 3–18 (2022).
Coifman, R. R. & Lafon, S. Diffusion maps. Appl. Comput. Harmon. Anal. 21, 5–30 (2006).
Singer, A. From graph to manifold Laplacian: the convergence rate. App. Comput. Harmon. Anal. 21, 128–134 (2006).
Tacke, F. & Randolph, G. J. Migratory fate and differentiation of blood monocyte subsets. Immunobiology 211, 609–618 (2006).
Van de Veerdonk, F. L. & Netea, M. G. Diversity: a hallmark of monocyte society. Immunity 33, 289–291 (2010).
Patel, A. A. et al. The fate and lifespan of human monocyte subsets in steady state and systemic inflammation. J. Exp. Med. 214, 1913–1923 (2017).
Chitu, V. & Stanley, E. R. Colony-stimulating factor-1 in immunity and inflammation. Curr. Opin. Immunol. 18, 39–48 (2006).
Imhof, B. A. & Dunon, D. Leukocyte migration and adhesion. Adv. Immunol. 58, 345–416 (1995).
Ghebrehiwet, B., Hosszu, K. K., Valentino, A., Ji, Y. & Peerschke, E. I. Monocyte expressed macromolecular C1 and C1q receptors as molecular sensors of danger: implications in SLE. Front. Immunol. 5, 278 (2014).
Heger, L. et al. Subsets of CD1c+ DCs: dendritic cell versus monocyte lineage. Front. Immunol. 11, 559166 (2020).
Higashi, N. et al. The macrophage C-type lectin specific for galactose/N-acetylgalactosamine is an endocytic receptor expressed on monocyte-derived immature dendritic cells. J. Biol. Chem. 277, 20686–20693 (2002).
Myung, P., Andl, T. & Atit, R. The origins of skin diversity: lessons from dermal fibroblasts. Development 149, dev200298 (2022).
Chen, D., Jarrell, A., Guo, C., Lang, R. & Atit, R. Dermal β-catenin activity in response to epidermal Wnt ligands is required for fibroblast proliferation and hair follicle initiation. Development 139, 1522–1533 (2012).
Fu, J. & Hsu, W. Epidermal Wnt controls hair follicle induction by orchestrating dynamic signaling crosstalk between the epidermis and dermis. J. Invest. Dermatol. 133, 890–898 (2013).
Hastie, T. J. Generalized Additive Models, pp. 249–307 (Routledge, 2017).
Wood, S. mgcv: Mixed GAM Computation Vehicle with GCV/AIC/REML Smoothness Estimation (University of Bath, 2012).
Pott, S. & Lieb, J. D. Single-cell ATAC–seq: strength in numbers. Genome Biol. 16, 172 (2015).
Ståhl, P. L. et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science 353, 78–82 (2016).
Macaulay, I. C., Ponting, C. P. & Voet, T. Single-cell multiomics: multiple measurements from single cells. Trends Genet. 33, 155–168 (2017).
Balasubramanian, M. & Schwartz, E. L. The isomap algorithm and topological stability. Science 295, 7 (2002).
Bernstein, M., De Silva, V., Langford, J. C. & Tenenbaum, J. B. Graph Approximations to Geodesics on Embedded Manifolds Technical Report (Department of Psychology, Stanford University, 2000).
Dassule, H. R., Lewis, P., Bei, M., Maas, R. & McMahon, A. P. Sonic hedgehog regulates growth and morphogenesis of the tooth. Development 127, 4775–4785 (2000).
Carpenter, A. C., Rao, S., Wells, J. M., Campbell, K. & Lang, R. A. Generation of mice with a conditional null allele for Wntless. Genesis 48, 554–558 (2010).
Satija, R., Farrell, J. A., Gennert, D., Schier, A. F. & Regev, A. Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33, 495–502 (2015).
Santos, A., Wernersson, R. & Jensen, L. J. Cyclebase 3.0: a multi-organism database on cell-cycle regulation and phenotypes. Nucleic Acids Res. 43, D1140–D1144 (2015).
Liu, Z. et al. Reconstructing cell cycle pseudo time-series via single-cell transcriptome data. Nat. Commun. 8, 22 (2017).
Günesdogan, U., Jäckle, H. & Herzig, A. Histone supply regulates s phase timing and cell cycle progression. eLife 3, e02443 (2014).
Moon, K. R. et al. Visualizing structure and transitions in high-dimensional biological data. Nat. Biotechnol. 37, 1482–1492 (2019).
Wood, S. & Wood, M. S. Package ‘mgcv’. scholar.google.com/citations?view_op=view_citation&hl=it&user=EskiIyEAAAAJ&citation_for_view=EskiIyEAAAAJ:kh2fBNsKQNwC (2015).
Bergen, V., Lange, M., Peidli, S., Wolf, F. A. & Theis, F. J. Generalizing RNA velocity to transient cell states through dynamical modeling. Nat. Biotechnol. 38, 1408–1414 (2020).
Acknowledgements
The authors thank J. Yang and M. Roulis for fruitful discussions. This study was supported by the National Institutes of Health (NIH) under grants R01GM131642 (to Y.K. and X.C.), UM1DA051410, U54AG076043, U54AG079759, P50CA121974 and U01DA053628 (to Y.K.). X.C. is also partially supported by the National Science Foundation (NSF) grant DMS-2237842. P.M. is supported by the National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIAMS) grant R01AR076420.
Author information
Authors and Affiliations
Contributions
R.Q., X.C. and Y.K. conceived the project and designed the framework. R.Q. and X.C. developed the method. R.Q. performed data analysis and wrote the manuscript. X.C. developed the computation methodology based on mathematical theories and contributed to writing. P.M. performed the experiments and interpreted the findings. Y.K., P.M., J.S.S. and E.S. contributed to the writing and offered vital insights into improving the work. E.S., P.M., R.A.F. and I.D.O. contributed to the overall biological interpretation. B.L. and R.C. offered conceptual insights related to the theoretical framework. F.S. assisted in software implementation. S.P. assisted in experimental data analysis. J.G. assisted in writing.
Corresponding author
Ethics declarations
Competing interests
R.A.F. is an advisor to GlaxoSmithKline, Zai Lab and Ventus Therapeutics. F.S. is employed as a director by PCMGF Limited. I.D.O. is the founder and president of Plythera and receives research funding from Ventus Therapeutics and SenTry.
Peer review
Peer review information
Nature Biotechnology thanks Xiaojie Qiu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Simulation framework and dataset visualization.
a. Illustration of GeneTrajectory simulation framework. A simple linear differentiation process simulation is shown. Each cell is associated with a pseudotime t along the process. For each gene, its expected expression level is modeled as a bell-shaped function of t, its real expression level in a given cell is drawn from a Poisson distribution (see details in Methods). b. GeneTrajectory analysis on the simulated data in a. The first panel shows the UMAP embedding of cells; the second panel delineates the progressive dynamics of the simulated biological process with five genes selected along each process; the 3rd–7th panels show the expression of selected genes in the cell embedding following their pseudotemporal order; The 8th panel displays the UMAP embedding of genes, colored by the ground truth of gene pseudo-order. c–f. Gene-by-cell count matrices visualized by heatmaps (in log scale). Each row corresponds to a gene, each column corresponds to a cell. Each heatmap corresponds to a simulation example in Fig. 2.
Extended Data Fig. 2 Myeloid cell type stratification.
a. UMAPs of selected well-studied myeloid gene markers identified along gene trajectories. b. Heatmap of cell-type specific gene markers (showing for each cell type the genes with the highest fold change in the average expression between that cell type and the remaining ones). c. Dot plot of cell-type specific gene markers in b. The color here indicates the average expression level of each gene in the corresponding cell type (after scaling).
Extended Data Fig. 3 Dermal cell type stratification.
a. UMAPs of gene expression profiles. b. Distribution of genes associated with different cell cycle phases along the CC gene trajectory.
Extended Data Fig. 4 Gene dynamics comparison between the wild type and Wls mutant.
a. Gene bin plots of the LD gene trajectory, split by condition. b. Gene bin plots of the CC gene trajectory, split by condition. c. Cell UMAPs are colored by the cell states which are categorized into multiple stages, split by two conditions. d. Change of Lef1 (Wnt) level across all stages, split by condition. Lef1 level is uniformly lower in the Wls KO than in the wild type. The box represents the interquartile range (IQR), with the line inside the box indicating the median. Whiskers extend to a maximum of 1.5× IQR beyond the box, with outliers represented as individual points.
Extended Data Fig. 5 Gene ordering results obtained by different methods on the dermal condensate genesis data.
The orderings of key genes activated during the dermal condensate differentiation process are delineated. Cell cycle effects were regressed out when constructing the cell graph.
Supplementary information
Supplementary Information
Supplementary Figs. 1–7.
Supplementary Tables
Supplementary Table 1: Simulation evaluation outputs (corresponding to Fig. 2). Spearman correlation is calculated between the inferred ordering and the ground truth for each underlying process in each example (each has ten replicates). Supplementary Table 2: Gene ordering along the DC gene trajectory. The first column lists the genes identified on this trajectory. The second column indicates the sequential order of each gene along the trajectory. The third column designates the specific gene bin to which each gene has been assigned.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Qu, R., Cheng, X., Sefik, E. et al. Gene trajectory inference for single-cell data by optimal transport metrics. Nat Biotechnol (2024). https://doi.org/10.1038/s41587-024-02186-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41587-024-02186-3