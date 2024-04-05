Abstract
Single-cell RNA sequencing has been widely used to investigate cell state transitions and gene dynamics of biological processes. Current strategies to infer the sequential dynamics of genes in a process typically rely on constructing cell pseudotime through cell trajectory inference. However, the presence of concurrent gene processes in the same group of cells and technical noise can obscure the true progression of the processes studied. To address this challenge, we present GeneTrajectory, an approach that identifies trajectories of genes rather than trajectories of cells. Specifically, optimal transport distances are calculated between gene distributions across the cell–cell graph to extract gene programs and define their gene pseudotemporal order. Here we demonstrate that GeneTrajectory accurately extracts progressive gene dynamics in myeloid lineage maturation. Moreover, we show that GeneTrajectory deconvolves key gene programs underlying mouse skin hair follicle dermal condensate differentiation that could not be resolved by cell trajectory approaches. GeneTrajectory facilitates the discovery of gene programs that control the changes and activities of biological processes.
Data availability
The human PBMC scRNA-seq dataset is available at https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.0/pbmc_10k_v3. The mouse embryonic skin dataset generated and analyzed in this study is available from the Gene Expression Omnibus with the accession GSE255534. The processed Seurat data objects for these two datasets are available at Figshare (https://doi.org/10.6084/m9.figshare.25243225). The Cyclebase gene list was extracted from Supplementary Table 5 in ref. 60.
Code availability
The R package of GeneTrajectory and the code used for data analysis are available on GitHub (https://github.com/KlugerLab/GeneTrajectory).
Acknowledgements
The authors thank J. Yang and M. Roulis for fruitful discussions. This study was supported by the National Institutes of Health (NIH) under grants R01GM131642 (to Y.K. and X.C.), UM1DA051410, U54AG076043, U54AG079759, P50CA121974 and U01DA053628 (to Y.K.). X.C. is also partially supported by the National Science Foundation (NSF) grant DMS-2237842. P.M. is supported by the National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIAMS) grant R01AR076420.
Ethics declarations
Competing interests
R.A.F. is an advisor to GlaxoSmithKline, Zai Lab and Ventus Therapeutics. F.S. is employed as a director by PCMGF Limited. I.D.O. is the founder and president of Plythera and receives research funding from Ventus Therapeutics and SenTry.
Extended data
Extended Data Fig. 1 Simulation framework and dataset visualization.
a. Illustration of GeneTrajectory simulation framework. A simple linear differentiation process simulation is shown. Each cell is associated with a pseudotime t along the process. For each gene, its expected expression level is modeled as a bell-shaped function of t, its real expression level in a given cell is drawn from a Poisson distribution (see details in Methods). b. GeneTrajectory analysis on the simulated data in a. The first panel shows the UMAP embedding of cells; the second panel delineates the progressive dynamics of the simulated biological process with five genes selected along each process; the 3rd–7th panels show the expression of selected genes in the cell embedding following their pseudotemporal order; The 8th panel displays the UMAP embedding of genes, colored by the ground truth of gene pseudo-order. c–f. Gene-by-cell count matrices visualized by heatmaps (in log scale). Each row corresponds to a gene, each column corresponds to a cell. Each heatmap corresponds to a simulation example in Fig. 2.
Extended Data Fig. 2 Myeloid cell type stratification.
a. UMAPs of selected well-studied myeloid gene markers identified along gene trajectories. b. Heatmap of cell-type specific gene markers (showing for each cell type the genes with the highest fold change in the average expression between that cell type and the remaining ones). c. Dot plot of cell-type specific gene markers in b. The color here indicates the average expression level of each gene in the corresponding cell type (after scaling).
Extended Data Fig. 3 Dermal cell type stratification.
a. UMAPs of gene expression profiles. b. Distribution of genes associated with different cell cycle phases along the CC gene trajectory.
Extended Data Fig. 4 Gene dynamics comparison between the wild type and Wls mutant.
a. Gene bin plots of the LD gene trajectory, split by condition. b. Gene bin plots of the CC gene trajectory, split by condition. c. Cell UMAPs are colored by the cell states which are categorized into multiple stages, split by two conditions. d. Change of Lef1 (Wnt) level across all stages, split by condition. Lef1 level is uniformly lower in the Wls KO than in the wild type. The box represents the interquartile range (IQR), with the line inside the box indicating the median. Whiskers extend to a maximum of 1.5× IQR beyond the box, with outliers represented as individual points.
Extended Data Fig. 5 Gene ordering results obtained by different methods on the dermal condensate genesis data.
The orderings of key genes activated during the dermal condensate differentiation process are delineated. Cell cycle effects were regressed out when constructing the cell graph.
Supplementary information
Supplementary Information
Supplementary Figs. 1–7.
Supplementary Tables
Supplementary Table 1: Simulation evaluation outputs (corresponding to Fig. 2). Spearman correlation is calculated between the inferred ordering and the ground truth for each underlying process in each example (each has ten replicates). Supplementary Table 2: Gene ordering along the DC gene trajectory. The first column lists the genes identified on this trajectory. The second column indicates the sequential order of each gene along the trajectory. The third column designates the specific gene bin to which each gene has been assigned.
About this article
Cite this article
Qu, R., Cheng, X., Sefik, E. et al. Gene trajectory inference for single-cell data by optimal transport metrics. Nat Biotechnol (2024). https://doi.org/10.1038/s41587-024-02186-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41587-024-02186-3