Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Gene trajectory inference for single-cell data by optimal transport metrics

Abstract

Single-cell RNA sequencing has been widely used to investigate cell state transitions and gene dynamics of biological processes. Current strategies to infer the sequential dynamics of genes in a process typically rely on constructing cell pseudotime through cell trajectory inference. However, the presence of concurrent gene processes in the same group of cells and technical noise can obscure the true progression of the processes studied. To address this challenge, we present GeneTrajectory, an approach that identifies trajectories of genes rather than trajectories of cells. Specifically, optimal transport distances are calculated between gene distributions across the cell–cell graph to extract gene programs and define their gene pseudotemporal order. Here we demonstrate that GeneTrajectory accurately extracts progressive gene dynamics in myeloid lineage maturation. Moreover, we show that GeneTrajectory deconvolves key gene programs underlying mouse skin hair follicle dermal condensate differentiation that could not be resolved by cell trajectory approaches. GeneTrajectory facilitates the discovery of gene programs that control the changes and activities of biological processes.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of GeneTrajectory.
Fig. 2: GeneTrajectory performance assessment based on simulation experiments.
Fig. 3: Gene trajectory inference on a myeloid scRNA-seq dataset.
Fig. 4: GeneTrajectory deconvolves two mixed processes during DC genesis.
Fig. 5: Gene dynamics comparative analysis.
Fig. 6: GeneTrajectory outperforms other methods in inferring gene ordering along concurrent processes.

Similar content being viewed by others

Data availability

The human PBMC scRNA-seq dataset is available at https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.0/pbmc_10k_v3. The mouse embryonic skin dataset generated and analyzed in this study is available from the Gene Expression Omnibus with the accession GSE255534. The processed Seurat data objects for these two datasets are available at Figshare (https://doi.org/10.6084/m9.figshare.25243225). The Cyclebase gene list was extracted from Supplementary Table 5 in ref. 60.

Code availability

The R package of GeneTrajectory and the code used for data analysis are available on GitHub (https://github.com/KlugerLab/GeneTrajectory).

References

  1. Mahdessian, D. et al. Spatiotemporal dissection of the cell cycle with single-cell proteogenomics. Nature 590, 649–654 (2021).

    Article  CAS  PubMed  Google Scholar 

  2. Scialdone, A. et al. Computational assignment of cell-cycle stage from single-cell transcriptome data. Methods 85, 54–61 (2015).

    Article  CAS  PubMed  Google Scholar 

  3. Skinner, S. O. et al. Single-cell analysis of transcription kinetics across the cell cycle. eLife 5, e12175 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  4. Cao, J., Zhou, W., Steemers, F., Trapnell, C. & Shendure, J. Sci-fate characterizes the dynamics of gene expression in single cells. Nat. Biotechnol. 38, 980–988 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Qu, R. et al. Decomposing a deterministic path to mesenchymal niche formation by two intersecting morphogen gradients. Dev. Cell 57, 1053–1067 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Macaulay, I. C. et al. Single-cell RNA-sequencing reveals a continuous spectrum of differentiation in hematopoietic cells. Cell Rep. 14, 966–977 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Chu, L.-F. et al. Single-cell RNA-seq reveals novel regulators of human embryonic stem cell differentiation to definitive endoderm. Genome Biol. 17, 173 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  8. Chen, R., Wu, X., Jiang, L. & Zhang, Y. Single-cell RNA-seq reveals hypothalamic cell diversity. Cell Rep. 18, 3227–3241 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Street, K. et al. Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genomics 19, 477 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  10. Cao, J. et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature 566, 496–502 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Wolf, F. A. et al. PAGA: graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells. Genome Biol. 20, 59 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  12. Van den Berge, K. et al. Trajectory-based differential expression analysis for single-cell sequencing data. Nat. Commun. 11, 1201 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  13. Deconinck, L., Cannoodt, R., Saelens, W., Deplancke, B. & Saeys, Y. Recent advances in trajectory inference from single-cell omics data. Curr. Opin. Syst. Biol. 27, 100344 (2021).

    Article  CAS  Google Scholar 

  14. Saelens, W., Cannoodt, R., Todorov, H. & Saeys, Y. A comparison of single-cell trajectory inference methods. Nat. Biotechnol. 37, 547–554 (2019).

    Article  CAS  PubMed  Google Scholar 

  15. Lange, M. et al. CellRank for directed single-cell fate mapping. Nat. Methods 19, 159–170 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Qiu, X. et al. Reversed graph embedding resolves complex single-cell trajectories. Nat. Methods 14, 979–982 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Haghverdi, L., Büttner, M., Wolf, F. A., Buettner, F. & Theis, F. J. Diffusion pseudotime robustly reconstructs lineage branching. Nat. Methods 13, 845–848 (2016).

    Article  CAS  PubMed  Google Scholar 

  18. Setty, M. et al. Characterization of cell fate probabilities in single-cell data with Palantir. Nat. Biotechnol. 37, 451–460 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Lönnberg, T. et al. Single-cell RNA-seq and computational analysis using temporal mixture modeling resolves Th1/Tfh fate bifurcation in malaria. Sci. Immunol. 2, eaal2192 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  20. Tritschler, S. et al. Concepts and limitations for learning developmental trajectories from single cell genomics. Development 146, dev170506 (2019).

    Article  PubMed  Google Scholar 

  21. Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 32, 381–386 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Ruijtenberg, S. & van den Heuvel, S. Coordinating cell proliferation and differentiation: antagonism between cell cycle regulators and cell type-specific gene expression. Cell Cycle 15, 196–212 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Rougny, A., Paulevé, L., Teboul, M. & Delaunay, F. A detailed map of coupled circadian clock and cell cycle with qualitative dynamics validation. BMC Bioinformatics 22, 240 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  24. Gupta, K. et al. Single-cell analysis reveals a hair follicle dermal niche molecular differentiation trajectory that begins prior to morphogenesis. Dev. Cell 48, 17–31 (2019).

    Article  CAS  PubMed  Google Scholar 

  25. Sood, P. et al. Modular, cascade-like transcriptional program of regeneration in stentor. eLife 11, e80778 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Zhu, H., Zhao, S. D., Ray, A., Zhang, Y. & Li, X. A comprehensive temporal patterning gene network in Drosophila medulla neuroblasts revealed by single-cell RNA sequencing. Nat. Commun. 13, 1247 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Li, J. et al. Systematic reconstruction of molecular cascades regulating GP development using single-cell RNA-seq. Cell Rep. 15, 1467–1480 (2016).

    Article  CAS  PubMed  Google Scholar 

  28. Huizing, G.-J., Peyré, G. & Cantini, L. Optimal transport improves cell–cell similarity inference in single-cell omics data. Bioinformatics 38, 2169–2177 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Bellazzi, R., Codegoni, A., Gualandi, S., Nicora, G. & Vercesi, E. The gene mover’s distance: single-cell similarity via optimal transport. Preprint at arXiv 10.48550/arXiv.2102.01218 (2021).

  30. Orlova, D. Y. et al. Earth mover’s distance (EMD): a true metric for comparing biomarker expression levels in cell populations. PLoS ONE 11, e0151859 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  31. Schiebinger, G. et al. Optimal-transport analysis of single-cell gene expression identifies developmental trajectories in reprogramming. Cell 176, 928–943 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Zhang, S., Afanassiev, A., Greenstreet, L., Matsumoto, T. & Schiebinger, G. Optimal transport analysis reveals trajectories in steady-state systems. PLoS Comput. Biol. 17, e1009466 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Cang, Z. & Nie, Q. Inferring spatial and signaling relationships between cells from single cell transcriptomic data. Nat. Commun. 11, 2084 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Moriel, N. et al. NovoSpaRc: flexible spatial reconstruction of single-cell gene expression with optimal transport. Nat. Protoc. 16, 4177–4200 (2021).

    Article  CAS  PubMed  Google Scholar 

  35. Demetci, P., Santorella, R., Sandstede, B., Noble, W. S. & Singh, R. SCOT: single-cell multi-omics alignment with optimal transport. J. Comput. Biol. 29, 3–18 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Coifman, R. R. & Lafon, S. Diffusion maps. Appl. Comput. Harmon. Anal. 21, 5–30 (2006).

    Article  Google Scholar 

  37. Singer, A. From graph to manifold Laplacian: the convergence rate. App. Comput. Harmon. Anal. 21, 128–134 (2006).

    Article  Google Scholar 

  38. Tacke, F. & Randolph, G. J. Migratory fate and differentiation of blood monocyte subsets. Immunobiology 211, 609–618 (2006).

    Article  CAS  PubMed  Google Scholar 

  39. Van de Veerdonk, F. L. & Netea, M. G. Diversity: a hallmark of monocyte society. Immunity 33, 289–291 (2010).

    Article  PubMed  Google Scholar 

  40. Patel, A. A. et al. The fate and lifespan of human monocyte subsets in steady state and systemic inflammation. J. Exp. Med. 214, 1913–1923 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Chitu, V. & Stanley, E. R. Colony-stimulating factor-1 in immunity and inflammation. Curr. Opin. Immunol. 18, 39–48 (2006).

    Article  CAS  PubMed  Google Scholar 

  42. Imhof, B. A. & Dunon, D. Leukocyte migration and adhesion. Adv. Immunol. 58, 345–416 (1995).

    CAS  PubMed  Google Scholar 

  43. Ghebrehiwet, B., Hosszu, K. K., Valentino, A., Ji, Y. & Peerschke, E. I. Monocyte expressed macromolecular C1 and C1q receptors as molecular sensors of danger: implications in SLE. Front. Immunol. 5, 278 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  44. Heger, L. et al. Subsets of CD1c+ DCs: dendritic cell versus monocyte lineage. Front. Immunol. 11, 559166 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Higashi, N. et al. The macrophage C-type lectin specific for galactose/N-acetylgalactosamine is an endocytic receptor expressed on monocyte-derived immature dendritic cells. J. Biol. Chem. 277, 20686–20693 (2002).

    Article  CAS  PubMed  Google Scholar 

  46. Myung, P., Andl, T. & Atit, R. The origins of skin diversity: lessons from dermal fibroblasts. Development 149, dev200298 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Chen, D., Jarrell, A., Guo, C., Lang, R. & Atit, R. Dermal β-catenin activity in response to epidermal Wnt ligands is required for fibroblast proliferation and hair follicle initiation. Development 139, 1522–1533 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Fu, J. & Hsu, W. Epidermal Wnt controls hair follicle induction by orchestrating dynamic signaling crosstalk between the epidermis and dermis. J. Invest. Dermatol. 133, 890–898 (2013).

    Article  CAS  PubMed  Google Scholar 

  49. Hastie, T. J. Generalized Additive Models, pp. 249–307 (Routledge, 2017).

  50. Wood, S. mgcv: Mixed GAM Computation Vehicle with GCV/AIC/REML Smoothness Estimation (University of Bath, 2012).

  51. Pott, S. & Lieb, J. D. Single-cell ATAC–seq: strength in numbers. Genome Biol. 16, 172 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  52. Ståhl, P. L. et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science 353, 78–82 (2016).

    Article  PubMed  Google Scholar 

  53. Macaulay, I. C., Ponting, C. P. & Voet, T. Single-cell multiomics: multiple measurements from single cells. Trends Genet. 33, 155–168 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Balasubramanian, M. & Schwartz, E. L. The isomap algorithm and topological stability. Science 295, 7 (2002).

    Article  PubMed  Google Scholar 

  55. Bernstein, M., De Silva, V., Langford, J. C. & Tenenbaum, J. B. Graph Approximations to Geodesics on Embedded Manifolds Technical Report (Department of Psychology, Stanford University, 2000).

  56. Dassule, H. R., Lewis, P., Bei, M., Maas, R. & McMahon, A. P. Sonic hedgehog regulates growth and morphogenesis of the tooth. Development 127, 4775–4785 (2000).

    Article  CAS  PubMed  Google Scholar 

  57. Carpenter, A. C., Rao, S., Wells, J. M., Campbell, K. & Lang, R. A. Generation of mice with a conditional null allele for Wntless. Genesis 48, 554–558 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Satija, R., Farrell, J. A., Gennert, D., Schier, A. F. & Regev, A. Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33, 495–502 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Santos, A., Wernersson, R. & Jensen, L. J. Cyclebase 3.0: a multi-organism database on cell-cycle regulation and phenotypes. Nucleic Acids Res. 43, D1140–D1144 (2015).

    Article  CAS  PubMed  Google Scholar 

  60. Liu, Z. et al. Reconstructing cell cycle pseudo time-series via single-cell transcriptome data. Nat. Commun. 8, 22 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  61. Günesdogan, U., Jäckle, H. & Herzig, A. Histone supply regulates s phase timing and cell cycle progression. eLife 3, e02443 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  62. Moon, K. R. et al. Visualizing structure and transitions in high-dimensional biological data. Nat. Biotechnol. 37, 1482–1492 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Wood, S. & Wood, M. S. Package ‘mgcv’. scholar.google.com/citations?view_op=view_citation&hl=it&user=EskiIyEAAAAJ&citation_for_view=EskiIyEAAAAJ:kh2fBNsKQNwC (2015).

  64. Bergen, V., Lange, M., Peidli, S., Wolf, F. A. & Theis, F. J. Generalizing RNA velocity to transient cell states through dynamical modeling. Nat. Biotechnol. 38, 1408–1414 (2020).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

The authors thank J. Yang and M. Roulis for fruitful discussions. This study was supported by the National Institutes of Health (NIH) under grants R01GM131642 (to Y.K. and X.C.), UM1DA051410, U54AG076043, U54AG079759, P50CA121974 and U01DA053628 (to Y.K.). X.C. is also partially supported by the National Science Foundation (NSF) grant DMS-2237842. P.M. is supported by the National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIAMS) grant R01AR076420.

Author information

Authors and Affiliations

Authors

Contributions

R.Q., X.C. and Y.K. conceived the project and designed the framework. R.Q. and X.C. developed the method. R.Q. performed data analysis and wrote the manuscript. X.C. developed the computation methodology based on mathematical theories and contributed to writing. P.M. performed the experiments and interpreted the findings. Y.K., P.M., J.S.S. and E.S. contributed to the writing and offered vital insights into improving the work. E.S., P.M., R.A.F. and I.D.O. contributed to the overall biological interpretation. B.L. and R.C. offered conceptual insights related to the theoretical framework. F.S. assisted in software implementation. S.P. assisted in experimental data analysis. J.G. assisted in writing.

Corresponding author

Correspondence to Yuval Kluger.

Ethics declarations

Competing interests

R.A.F. is an advisor to GlaxoSmithKline, Zai Lab and Ventus Therapeutics. F.S. is employed as a director by PCMGF Limited. I.D.O. is the founder and president of Plythera and receives research funding from Ventus Therapeutics and SenTry.

Peer review

Peer review information

Nature Biotechnology thanks Xiaojie Qiu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Simulation framework and dataset visualization.

a. Illustration of GeneTrajectory simulation framework. A simple linear differentiation process simulation is shown. Each cell is associated with a pseudotime t along the process. For each gene, its expected expression level is modeled as a bell-shaped function of t, its real expression level in a given cell is drawn from a Poisson distribution (see details in Methods). b. GeneTrajectory analysis on the simulated data in a. The first panel shows the UMAP embedding of cells; the second panel delineates the progressive dynamics of the simulated biological process with five genes selected along each process; the 3rd–7th panels show the expression of selected genes in the cell embedding following their pseudotemporal order; The 8th panel displays the UMAP embedding of genes, colored by the ground truth of gene pseudo-order. cf. Gene-by-cell count matrices visualized by heatmaps (in log scale). Each row corresponds to a gene, each column corresponds to a cell. Each heatmap corresponds to a simulation example in Fig. 2.

Extended Data Fig. 2 Myeloid cell type stratification.

a. UMAPs of selected well-studied myeloid gene markers identified along gene trajectories. b. Heatmap of cell-type specific gene markers (showing for each cell type the genes with the highest fold change in the average expression between that cell type and the remaining ones). c. Dot plot of cell-type specific gene markers in b. The color here indicates the average expression level of each gene in the corresponding cell type (after scaling).

Extended Data Fig. 3 Dermal cell type stratification.

a. UMAPs of gene expression profiles. b. Distribution of genes associated with different cell cycle phases along the CC gene trajectory.

Extended Data Fig. 4 Gene dynamics comparison between the wild type and Wls mutant.

a. Gene bin plots of the LD gene trajectory, split by condition. b. Gene bin plots of the CC gene trajectory, split by condition. c. Cell UMAPs are colored by the cell states which are categorized into multiple stages, split by two conditions. d. Change of Lef1 (Wnt) level across all stages, split by condition. Lef1 level is uniformly lower in the Wls KO than in the wild type. The box represents the interquartile range (IQR), with the line inside the box indicating the median. Whiskers extend to a maximum of 1.5× IQR beyond the box, with outliers represented as individual points.

Extended Data Fig. 5 Gene ordering results obtained by different methods on the dermal condensate genesis data.

The orderings of key genes activated during the dermal condensate differentiation process are delineated. Cell cycle effects were regressed out when constructing the cell graph.

Supplementary information

Supplementary Information

Supplementary Figs. 1–7.

Reporting Summary

Supplementary Tables

Supplementary Table 1: Simulation evaluation outputs (corresponding to Fig. 2). Spearman correlation is calculated between the inferred ordering and the ground truth for each underlying process in each example (each has ten replicates). Supplementary Table 2: Gene ordering along the DC gene trajectory. The first column lists the genes identified on this trajectory. The second column indicates the sequential order of each gene along the trajectory. The third column designates the specific gene bin to which each gene has been assigned.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qu, R., Cheng, X., Sefik, E. et al. Gene trajectory inference for single-cell data by optimal transport metrics. Nat Biotechnol (2024). https://doi.org/10.1038/s41587-024-02186-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41587-024-02186-3

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics