Abstract
Pseudotime analysis with singlecell RNAsequencing (scRNAseq) data has been widely used to study dynamic gene regulatory programs along continuous biological processes. While many methods have been developed to infer the pseudotemporal trajectories of cells within a biological sample, it remains a challenge to compare pseudotemporal patterns with multiple samples (or replicates) across different experimental conditions. Here, we introduce Lamian, a comprehensive and statisticallyrigorous computational framework for differential multisample pseudotime analysis. Lamian can be used to identify changes in a biological process associated with sample covariates, such as different biological conditions while adjusting for batch effects, and to detect changes in gene expression, cell density, and topology of a pseudotemporal trajectory. Unlike existing methods that ignore sample variability, Lamian draws statistical inference after accounting for crosssample variability and hence substantially reduces samplespecific false discoveries that are not generalizable to new samples. Using both real scRNAseq and simulation data, including an analysis of differential immune response programs between COVID19 patients with different disease severity levels, we demonstrate the advantages of Lamian in decoding cellular gene expression programs in continuous biological processes.
Similar content being viewed by others
Introduction
Singlecell RNAsequencing (scRNAseq) enables the dissection of complex cellular programs at singlecell resolution in biological samples with heterogeneous cell compositions. When cells in a sample come from a continuous biological process, computationally placing the cells along a pseudotemporal trajectory based on their progressively changing transcriptomes is a powerful approach to reconstructing the dynamic gene expression programs of the underlying biological process. This approach, also known as pseudotime analysis^{1,2,3}, is now widely used to study cell differentiation^{4,5,6}, immune responses^{7,8}, disease development^{9,10,11,12}, and many other biological systems with temporal dynamics. A systematic review and comparison of these methods can be found in a recent benchmark study^{3}. The majority of existing methods were designed to infer gene expression changes along the reconstructed trajectory within one biological sample. However, scRNAseq experiments today standardly generate data with multiple biological samples across multiple conditions. For example, a number of COVID19 studies generated scRNAseq data from multiple patients with differential disease severity levels^{13,14,15,16,17,18,19}. Therefore, there is an increasing demand for methods that can simultaneously (i) take into account sampletosample variation and (ii) identify changes in pseudotemporal trajectories across conditions. To meet this demand, two challenges need to be addressed.
First, changes in pseudotemporal trajectories across conditions can occur in multiple ways, including (i) topological differences, such as a cell lineage along differentiation is lost (or added) in one sample group compared to another group, (ii) changes in the proportion (or density or abundance) of cells along a cell lineage across conditions, and (iii) changes in the gene expression itself along pseudotime across conditions. An ideal solution would address all three types of changes in one comprehensive statistical framework.
Second, in order to separate changes of biological interest (e.g. difference between treatment and control) from other biological or technical noises, it is important to account for naturally occurring samplelevel variations, not of interest (e.g. sampletosample variation within the treatment or control group), unwanted technical variations (e.g. batch effects), and other uncertainties in the analysis (e.g. uncertainties of the inferred trajectory and pseudotime).
However, there currently does not exist a comprehensive integrative framework that identifies all three types of changes in pseudotemporal trajectories (topology, cell density, and gene expression) across experimental conditions with multiple samples per condition, while also accounting for samplelevel variability.
Although there exist pseudotime analysis methods to detect changes in gene expression along pseudotime (e.g. Monocle^{20,21,22}, TSCAN^{23}, Slingshot^{24}), in cell abundance along pseudotime (e.g. milo^{25}, DAseq^{26}), and in trajectory lineages (e.g. tradeSeq^{27}), most methods do not investigate changes across conditions. Almost all methods ignore sampletosample variation by either only analyzing cells from a single sample or treating cells from multiple samples as if they were from a single sample. For the latter, cells from different samples are usually integrated in a lowdimensional space by removing both biological and technical differences among samples, and a trajectory is then inferred to characterize dynamic cellular programs along pseudotime, without considering variability among samples.
Phenopath^{28} and condiments^{29} are two pseudotime methods capable of identifying changes across conditions. Condiments assumes that each condition has one sample and therefore does not consider sampletosample variation within each condition when each condition has multiple replicates. Ignoring samplelevel variability can result in false discoveries not generalizable to new samples. Phenopath assumes gene expression changes linearly along pseudotime and cannot deal with arbitrary differences between conditions which may be nonlinear functions of pseudotime. Moreover, it does not estimate samplelevel variance separately from celllevel variance. Thus, similar to condiments, one cannot assess whether the observed difference between conditions is real or expected by chance based on the random samplelevel variability within each condition. Although properly accounting for the variation across samples is important in multisample singlecell data, neither PhenoPath nor condiments can meet this need.
Pseudotime inference itself also has uncertainties. Recently, PseudotimeDE^{30} has been proposed to account for pseudotime reconstruction uncertainties in singlesample pseudotime analysis via subsampling cells and permuting pseudotime. However, this approach does not consider multiple samples and therefore does not characterize variability and differences across samples.
To address these gaps, we introduce a comprehensive and integrative statistical framework, referred to as Lamian, for differential multisample pseudotime analysis. Lamian is named after a traditional Chinese handpulled noodle. The name is chosen based on the similarity between the process of making Lamian noodles and our statistical model in which multisample singlecell data are described using multiple smooth noodlelike functional curves (Fig. S1). Given scRNAseq data from multiple biological samples with known covariates, such as age, sex, sample type, and disease status, Lamian can be used to (1) construct pseudotemporal trajectories and evaluate the uncertainty of the topologies, (2) evaluate changes in the topological structure associated with sample covariates, (3) describe how gene expression and cell density change along the pseudotime, and (4) characterize how sample covariates modify the pseudotemporal dynamics of gene expression and cell density. Importantly, when identifying gene expression or cell density changes, Lamian accounts for variability across biological samples. As a result, Lamian is able to more appropriately control the false discovery rate (FDR)^{31} when analyzing multisample data, a property not offered by other existing methods.
Results
Lamian: a statistical framework for differential pseudotemporal trajectory analysis in multiple samples
Lamian consists of four modules tackling different aspects of multisample pseudotime analysis (Fig. 1). The input for Lamian includes (1) a lowdimensional representation of cells, such as principal components (PCs) or other lowdimensional embeddings of the scRNAseq data from multiple samples that have been harmonized into a common space using methods such as Seurat^{32}, Harmony^{33} or scVI^{34},
(2) the normalized scRNAseq gene expression matrices, and (3) samplelevel metadata, such as covariate information corresponding to samples’ biological groups, experimental conditions, and batch indicators for batch effect correction. We assume that the data harmonization is done by users and refer readers to a recent benchmark study^{35} for guidelines on choosing the harmonization methods. Advantages of Lamian compared to existing methods (Table S1) include comprehensive solutions to evaluating tree topology uncertainty and differential topology and identifying gene expression and cell density changes associated with sample covariates while accounting for samplelevel variability.
Module 1 of Lamian uses the harmonized data to construct a pseudotemporal trajectory and then quantifies the uncertainty of tree branches using bootstrap resampling. First, cells from all samples are jointly clustered (Fig. 1a), and the clusterbased minimum spanning tree (cMST) approach described in TSCAN^{23} is used to construct a pseudotemporal trajectory. The tree can have multiple branches, allowing one to model multiple lineages of a dynamic process. Next, after users specify a tree node as the start of pseudotime or marker genes that should highly express at the start of pseudotime, Lamian will automatically enumerate all pseudotemporal paths and branches. Then, it evaluates the uncertainty of each branch by quantifying a metric we refer to as the detection rate, which is defined as the probability that a tree branch can be detected in repeated bootstrap samplings of cells (Fig. 1b). The advantages of using TSCAN to construct pseudotime include (i) the scalability of its cMST approach to a large number of cells (since the number of tree nodes in the spanning tree is determined by cell cluster number instead of cell number) and repeated bootstrap resamplings, (ii) the flexibility it provides to support both automatic and manual trajectory construction^{23}, and (iii) its overall competitive performance in multiple previous benchmark evaluations^{3,36}.
Module 2 of Lamian first identifies variation in tree topology across samples and then assesses if there are differential topological changes associated with sample covariates (Fig. 1b). For each sample, Lamian calculates the proportion of cells in each tree branch, referred to as branch cell proportion. Because a zero or low proportion can reflect the absence or depletion of a branch, changes in tree topology can be described using branch cell proportion changes. With multiple samples, Lamian characterizes the crosssample variation of each branch by estimating the variance of the branch cell proportion across samples. Furthermore, regression models can be fit to test whether the branch cell proportion is associated with sample covariates. To facilitate convenient exploration of each individual branch, one can use a binomial logistic regression to evaluate covariateassociated branch cell proportion changes for each branch separately. Alternatively, users can also use a multinomial logistic regression to analyze covariateassociated changes of cell proportion ratios between branches by considering all branches jointly (Supplementary Notes). These regressionbased methods allow one to identify tree topology changes between different conditions, for example in a casecontrol cohort, accounting for samplelevel variability. They are functions not provided by methods such as PhenoPath, condiments, and PseudotimeDE.
Given a path or branch along a pseudotemporal trajectory, the scRNAseq gene expression matrices from multiple samples, and samplelevel covariate information, Module 3 of Lamian identifies differentially expressed (DE) genes using a functional mixed effects model (Fig. 1c). There are two types of DE tests. First, the TDE test evaluates whether a gene’s activity as a function of pseudotime t, denoted as f(t), is a constant (H_{0}: f(t) = c), with the goal to identify genes whose activities change along pseudotime (H_{1}: f(t) ≠ c). Here, TDE refers to pseudotime differential expression. Second, the XDE test evaluates for each gene whether the pseudotemporal activity f(t) is associated with a samplelevel covariate, such as whether f(t) is different between healthy and disease samples. Here, XDE refers to covariate X differential expression. Currently, existing pseudotime methods, such as Monocle, Slingshot and TSCAN only detect TDE, but not XDE. PhenoPath and condiments may detect XDE but do not account for samplelevel variability in multisample studies. Lamian is an integrative framework to provide both TDE and XDE for multiple sample analyses. For each XDE gene, Lamian further evaluates whether the sample covariate shifts the mean of f(t) (referred to as a mean shift) or changes the functional form of f(t) (referred to as a trend difference) or both. Additionally, unsupervised clustering (kmeans by default, Louvain and Gaussian mixture model clustering are provided as optional) is applied to DE genes to group and summarize major differential gene patterns. In all DE tests, Lamian accounts for sampletosample variation directly in its model framework, whereas the other methods do not. Consequently, Lamian is able to better control the false discovery rate (FDR)^{31} compared to existing methods that ignore sampletosample variation which leads to identifying false discoveries that are not generalizable in new samples. By default, Lamian uses a permutation approach to determine statistical significance of the DE tests (Lamian.pm). This approach is more reliable but can be computationally slow. For fast computation, Lamian also provides an option to determine significance using the chisquared distribution as the asymptotic null for the likelihood ratio statistics (Lamian.chisq). This option is fast but less accurate. It can be used when users want to run a quick initial analysis while waiting for more rigorous results from Lamian.pm, especially when dealing with a large dataset. Below Lamian refers to Lamian.pm unless otherwise specified.
Similar to gene expression, Module 4 of Lamian tests whether cells’ density along pseudotime is uniformly distributed or not (TCD test), and if it is associated with a sample covariate (XCD test). This may be used to study dynamic processes, such as cell expansion in immune response or how disease changes the pseudotemporal cell density pattern.
In all differential analyses, unwanted technical variations such as batch effects or other confounding variables can be adjusted by regressing them out in the Lamian regression model.
Lamian estimates tree topology stability and accurately detects differential tree topology
We begin with illustrating Modules 1 and 2 of Lamian using a Human Cell Atlas (HCA)^{37,38} 10x Genomics scRNAseq dataset, referred to as HCABM, consisting of bone marrow samples from 8 donors (4 females and 4 males) and a total of 32,819 cells. Bone marrow contains hematopoietic stem cells (HSCs) differentiating into different blood cell types, creating a natural branching structure. This dataset along with the existing biological knowledge about this system therefore can be used to demonstrate and evaluate Lamian’s ability to analyze a trajectory with branches.
First, we construct the pseudotemporal trajectory and assess the tree topology stability (Module 1). Applying TSCAN to the Seuratharmonized bone marrow data, we identified 6 cell clusters (Fig. 2a), which form a minimum spanning tree with three branches, corresponding to the three major lineages of HSC differentiation  myeloid, erythroid, and lymphoid (Fig. 2b). We confirmed these lineages with known marker genes (Figs. 2c, S2). Specifically, HSCs are mostly in cluster 5, as indicated by high CD34 expression (Fig. 2c). By setting cluster 5 as the origin, we obtained three pseudotemporal paths (Fig. 2a: the path of cluster 5 → 1; 5 → 6 → 2; 5 → 3 → 4). Lamian uses repeated bootstrap sampling of cells along the branches to calculate a detection rate. In the HCABM data, these three branches can be detected in 93.8% (5 → 1), 95.3% (5 → 6 → 2), and 61.5% (5 → 3 → 4) in all bootstrap samples (or with a detection rate = 0.938, 0.953 and 0.615), suggesting that they are real and can be reliably detected from data. Note that although TSCAN is scalable to a large number of clusters as tree nodes and can handle more complex tree structures, increasing tree complexity can also introduce noise and produce many unreliable branches with low detection rates (Fig. S3). Therefore, we proceed with the three branches here as their presence is robustly supported by the available data and also consistent with known biology.
Next, we assess the variability in the branch cell proportions across samples and between conditions (Module 2). Using all 8 donors, the branch cell proportion is 41.1%, 48.4%, and 10.5% for the myeloid, erythroid, and lymphoid branches, respectively. Of note, the proportions show variation across donors (proportion Mean (SD) = 0.41 (0.10) for myeloid, 0.48 (0.11) for erythroid, 0.11 (0.01) for lymphoid). Lamian allows one to assess if there is a statistically significant difference in the tree topology (i.e. branch cell proportion) between two sample groups. As an example, comparing the branch cell proportion between male and female donors in the HCABM data by applying the binomial logistic regression to each branch did not show significant differences along the myeloid, erythroid, and lymphoid lineages (pvalues = 0.35, 0.64, 0.94, respectively), suggesting that there is no significant change in tree topology between the two sexes (Fig. 2d). Using multinomial logistic regression showed similar results (pvalues for odds ratios between male and female = 0.20, 0.39 for myeloid and lymphoid, respectively, using erythroid as the baseline since by default Lamian uses the most abundant branch as the baseline category in multinomial logistic regression).
To demonstrate the validity of Lamian’s topology stability and differential topology analysis, we performed two sets of simulations. In Simulation 1, we subsampled cells in the myeloid lineage in the HCABM data to reduce the myeloid cell number while retaining all cells in the erythroid and lymphoid lineages (Fig. 2e, f). As expected, decreasing the number of cells decreased the detection rate for the myeloid branch (Fig. 2g). For example, when 80% cells in the myeloid lineage were reduced, the detection rate dropped to 0.106 (Fig. 2e, g). Hence, the detection rate provides a reasonable measure for quantifying the certainty (or uncertainty) conveyed by the data about the presence of a branch.
In Simulation 2, we reduced the number of cells in the myeloid lineage in four out of the eight samples while retaining all cells in the other two lineages (Fig. 2f). As the number of cells decreased, the detection rate of the myeloid branch again decreased, but at a much slower rate compared to Simulation 1 (Fig. 2g). We found that conditional on the branch being detected, our differential topology tests (Module 2) were able to detect differences in the branch cell proportion between the two groups of samples in this simulation scenario. Most importantly, they controlled the probability of false positives (type I error rate) when there were no differences (i.e. removing no cells or 0% of cells) and also had increasing statistical power to detect true positives as we increased the percent of cells removed in half of the samples (Fig. 2h).
Lamian comprehensively detects differential pseudotemporal gene expression and cell density
We next illustrate how Lamian adjusts for sampletosample variation to identify differential gene expression (Module 3: TDE and XDE tests) and differential cell density (Module 4: TCD and XCD tests) along pseudotime using the eight samples in the HCABM dataset.
First, we ask which genes are varying along pseudotime (Module 3: TDE test). We reasoned that a proper TDE analysis should be able to identify transcriptional programs associated with lineage specification. Applying the TDE test with a 5% FDR cutoff, Lamian identified 8475, 7454 and 8953 TDE genes for the myeloid, erythroid, and lymphoid lineage, respectively (Fig. 3a–c). Among the TDEs, we found known lineage markers corresponding to each lineage, such as CD14 for myeloid, HBB and GATA1 for erythroid, and CD3D, CD19, CD27 for lymphoid. Hence, TDE genes can be used to identify branch lineages in the tree topology. Unsupervised clustering of TDE genes and gene ontology (GO) analysis revealed the dynamic transcriptional programs associated with each lineage (Fig. 3a–c, Fig. S4). For example, as HSCs differentiate to the erythroid lineage, the TDE genes with increasing expression along pseudotime are enriched in red blood cellrelated functions such as oxygen transport, whereas genes with functions in other lineages (e.g. CD8positive, alphabeta T cell activation, regulation of B cell receptor signaling pathway) show decreasing expression suggesting that they are increasingly suppressed (Fig. S4c, d). Meanwhile, for the lymphoid lineage, the TDE genes with increasing expression along pseudotime are enriched in T lineage commitment, whereas genes with decreasing expression lack enrichment of lymphocytespecific functions (Fig. S4e, f).
Next, we tested whether there are differential gene expression patterns along pseudotime associated with sex as a covariate (Module 3: XDE test). Currently, which genes are sexassociated XDE genes in this system is not completely known. However, we reasoned that if there is no sexassociated XDE gene, then any XDE gene reported by the algorithm would be noise, and a priori one would not expect genes that are random noise to be associated with sex chromosomes. On the other hand, if XDE genes reported by Lamian in a genomewide analysis are found to be enriched in sex chromosomes, it would suggest that sexassociated XDE genes exist and the algorithm is able to detect true XDE signals. For each gene, Lamian reports three FDRs: (1) FDR_{overall} corresponds to testing if a gene is XDE (overall test), (2) FDR_{trend} corresponds to testing if an XDE gene has significant trend difference associated with the sample covariate (trend test), and (3) FDR_{mean} corresponds to testing if an XDE gene has significant mean shift associated with the covariate (mean test). In addition, there are two other categories: both mean and trend differences (bothSig), or neither mean or trend differences (otherSig). Using the XDE test, Lamian identified 43, 32 and 29 genes (overall test) with significant differences (at the 5% FDR_{overall} cutoff) between male and female along the myeloid (Fig. 3d), erythroid (Fig. S5a), and lymphoid (Fig. S5b) lineages, respectively. Next, Lamian further annotated the XDE genes into the gene patterns described above. For the myeloid lineage, this results in 10 genes with mean shift only, 16 genes with trend difference only, and 12 genes with significant changes both in mean and trend (Fig. 3d,e). Among the XDE genes, 33% (N=14) are from chromosome X and Y, representing a significant enrichment in sex chromosomes (Fig. 3d, permutation test pvalue = 0.0036 for chromosome X and p = 0.000 for chromosome Y, see Methods). Notably, among the genes that show significant mean shift (with or without trend difference), 12 genes have higher mean expression in males and they consist of 8 genes on Y chromosome and 4 genes on autosomes. Likewise, 10 genes have higher mean in females and they consist of 3 genes on X chromosome and 7 genes on autosomes (Fig. 3d). Unsupervised clustering of XDE genes revealed cascades of their dynamic transcriptional programs. For example, among genes with trend difference only, the difference in SHISA5 expression between female and male was positive at the beginning and negative at the end of the pseudotime, whereas the difference in DUSP11 was negative at the beginning and positive at the end (Fig. 3d). Analyses of the erythroid and lymphoid lineages yielded similar results (Fig. S5). Among the XDE genes, a number of them have been reported to have functions related to hematopoietic stem and progenitor cells (e.g. ALS2^{39}, DDX3Y^{40}, ZFX^{41}). Our analysis here suggests that their functional activities may be sexdependent.
Finally, we tested for changes in cell density both along the pseudotime (Module 4: TCD test) and whether these patterns were associated with sex as a sample covariate (Module 4: XCD test). The TCD test shows that cell density changed significantly along all three lineages (myeloid: Fig. 3f; erythroid: Fig. S6a; and lymphoid: Fig. S6c) (all pvalues after adjusting for multiple testing are < 2.22 × 10^{−308}), although it is unclear whether the cell density change was due to technical sampling bias (e.g. certain cell types are easier to sample) or real biology. We asked whether the cell density changes were correlated with changes of cell cycle along pseudotime but did not find clear correlation (Fig. S7). In the XCD test, we did not find significant differences in cell density along pseudotime between male and female (myeloid: Fig. 3g; erythroid: Fig. S6b; and lymphoid: Fig. S6d).
Lamian is more powerful than existing methods to detect differences while controlling the FDR by accounting for samplelevel variation
In this section, we demonstrate that Lamian is more powerful than existing methods to detect gene expression differences that are associated with a covariate (Module 3: XDE test). Robustly comparing methods requires datasets with a sufficiently large number of known differential and nondifferential genes to serve as the ground truth. Unfortunately, such datasets are not widely available. To address this, we combine simulations with the real HCABM data for method evaluation. The HCABM dataset is unique in that its male and female samples allow a betweensex comparison. Since there are many sex chromosome genes, the enrichment of sexassociated XDE genes in sex chromosomes can provide an objective and relatively robust benchmark to compare different methods. Thus, the HCABM data is used in this article for both method demonstration and evaluation. In supplementary data, we also demonstrate how incorporating the the sampletosample variation into the differential gene expression test along pseudotime (Module 3: TDE test) leads to less false discoveries compared to existing methods that also perform TDE detection.
For XDE analysis, we compared Lamian with limma^{42}, Monocle2^{21}, tradeSeq^{27}, Phenopath^{28}, and condiments^{29}. For Lamian, we also compared two ways to compute pvalues and FDR: Lamian.pm (default) and Lamian.chisq. As limma is designed to detect differential mean gene expression, we pooled all cells on a pseudotemporal path or branch to create a pseudobulk expression profile (i.e. the average expression across cells for a gene) for each sample. In this way, limma uses the pseudobulk data to detect mean differences between two sample groups. tradeSeq (which is used by Slingshot) is a method originally developed for comparing different branches of a pseudotemporal trajectory within a single sample. Here, we tailored the function to compare the same branch in a pseudotemporal trajectory between two samples. Since tradeSeq and condiments do not consider crosssample variability, cells from replicate samples were pooled and treated as if they came from a single sample for both methods. Phenopath was run by specifying each cell’s sample origin and sample group label and exporting sample groupassociated genes. Monocle2 does not directly handle XDE analysis, but we tailored its model and created a new function “monocle2TrajTestCorr” to allow XDE detection in our data (see Methods).
First, we created a null data set based on the HCABM data (Methods). Briefly, we first randomly partitioned the eight HCABM samples into two groups and removed the group differences to create a dataset where we do not expect any XDE genes between the two groups (Fig. 4a). When Lamian (Lamian.pm) was applied to detect group differences, no XDE genes were reported at 5% FDR cutoff. Using the same cutoff, Lamian.chisq reported 62 XDE genes. By contrast, other methods reported 7846 (monocle2TrajTestCorr), 8783 (tradeSeqPatternTest), 7259 (tradeSeqEarlyDETest), 5822 (tradeSeqDiffEndTest), 7400 (PhenoPath, max.iter = 500, without setting max.iter the program cannot provide results), and 8753 (condiments) differential genes, which are all false positives. Similar to Lamian, limma reported no XDE genes. However, as will be shown below, limma can only detect differences in mean expression and cannot detect trend differences in pseudotemporal patterns.
Building upon the null data set above, we then introduced in silico spikein differential signals with varying strengths and pseudotemporal patterns between the two sample groups to a random set of genes (details in Methods). In this way, we know which genes are XDE genes and whether they have mean shift, trend difference, or both (Fig. 4b). Next, we applied Lamian to identify XDE genes and clustered genes based on their differential patterns using the default kmeans clustering (Fig. 4c, d). Using Gaussian mixture model and Louvain clustering yielded similar results (Figs. S8, S9). We compared Lamian XDE genes with XDE analysis from other methods. For all three tests (overall test, trend test, mean test), and across all signal strength levels, the real FDR was smaller than the FDR reported by Lamian(Lamian.pm), demonstrating that Lamian was able to conservatively estimate FDR (Fig. 4e, g). Lamian.chisq also provided reasonable FDR estimates but it slightly underestimated the real FDR. The other methods do not report separate FDRs for mean and trend differences. TradeSeq can be run to detect different types of DE: earlyDETest identifies genes that show expression difference in early pseudotime; patternTest identifies genes that show expression difference along all pseudotime that are equallyspaced; diffEndTest compare the average expression at the end stage of pseudotime. It assigns an FDR for each test. Each of the other methods reports an overall FDR for each gene (see Methods). Unlike Lamian, all existing methods underestimated the real FDR: the difference between the real FDR and their reported FDR was positive in most cases (Fig. 4e). We also stratified XDE genes into three groups  mean shift only, trend difference only, and both mean and trend differences  based on their true states. Within each stratum, the FDR_{overall} reported by Lamian conservatively estimated the real FDR, whereas the other methods underestimated the real FDR (Fig. 4e).
We further compared the statistical power of detecting differences in temporal gene expression associated with samplelevel covariates via the sensitivityrealFDR curve and the area under the curve (AUC) (Fig. 4f, h). The power of detecting XDE genes by Lamian increased with increasing signal strength, both for detecting XDE genes overall or for detecting a specific class of XDE genes (Fig. 4f). For detecting all XDE genes (overall test), all competing methods had lower power compared to Lamian (Fig. 4f). Within Lamian, Lamian.pm slightly outperformed Lamian.chisq. In the remaining methods, monocle2TrajTestCorr was among the top but it failed to control FDR. When XDE genes are stratified, limma had comparable power to Lamian for detecting XDE genes with mean shift (i.e. mean shift only or both mean and trend differences) but had zero power to detect genes with trend difference only. TradeSeq and condiments both had lower power than Lamian in all XDE gene categories (Fig. 4f).
In addition to our simulation studies, we compared different methods using the real HCABM dataset to detect sex differences (Fig. S10). For the myeloid lineage, limma detected 5 XDE genes and all of them were found by Lamian. Lamian reported an additional 38 genes not found by limma (25 with trend difference, 9 with mean shift only) (Fig. S10a). XDE genes found by Lamian but not limma showed significant enrichment in sex chromosomes (Fisher’s exact test p−values: chrX 0.023, chrY 1.05 × 10^{−12}), suggesting that these genes are indeed sex related. TradeSeq, condiments, PhenoPath and monocle2TrajTestCorr reported 3677, 4226, 10661 and 10502 XDE genes, respectively. However, a closer examination of their results indicates that a subset of these genes are false positives (Figs. 4j, S11). For example, BCLAF1 was reported as XDE by condiments. For this gene, when cells from replicate samples were treated as if they were from one sample, the fitted gene expression curve along pseudotime are different between male and female, which explains why condiments reported the gene as XDE. However, when the gene expression curve is fitted within each sample, the variation among replicate samples is much bigger than the difference between male and female and hence there is no real statistically significant sex difference (Fig. 4j). In contrast, RCHY1 was an XDE gene reported by Lamian but not condiments, the sex difference is clear even after accounting for sample variability (Fig. 4j). Overall, XDE genes reported by Lamian and limma showed the most significant overlap with both chromosome X and chromosome Y (Fig. 4i). The performance of Lamian on the other two lineages was similar (Fig. S12). Indeed, only Lamian and limma showed significant overlap with both sex chromosomes in all three lineages. Additionally, Lamian also showed the largest overlap with genes escaping Xchromosome inactivation (XCI), further demonstrating its top performance in detecting sexassociated XDE (Fig. S13). Collectively, our analyses demonstrate that Lamian is better able to detect XDE genes compared to the other existing methods.
In addition to detecting differentially expressed genes along pseudotime that are associated with a sample covariate, Lamian can also detect differentially expressed genes along pseudotime without any covariate information (Module 3: TDE test). In this case, there are existing methods, such as Monocle, Slingshot, tradeSeq, TSCAN and PseudotimeDE that perform a similar test. However, unlike these existing methods, Lamian incorporates sampletosample variability into the statistical estimation framework. Using simulated data with multiple samples, we found that Lamian, compared to existing methods, controls the FDR, while also maintaining strong statistical power for TDE detection (Supplementary Notes, Fig. S14).
Finally, similar to DE analysis, our evaluation also shows that Lamian can accurately detect TCD and XCD with a wellcontrolled type I error rate and high statistical power (Supplementary Notes, Fig. S15).
Lamian analysis of COVID19 scRNAseq data identifies differential CD8 T cell transcriptional programs during a critical stage of disease severity transition
To further demonstrate and evaluate Lamian’s ability to detect differences associated with sample covariates along a continuous process, we applied Lamian to a COVID19 peripheral blood mononuclear cell (PBMC) 10x Genomics scRNAseq dataset obtained from a recent study^{43}. The COVID19 disease severity of a patient may progress from mild to moderate to severe. It was reported that the mild to moderate transition is a critical stage with rapid immune landscape changes that may determine the trajectory of disease progression^{43}. CD8+ T cell activation is an important component of COVID19 patients’ immune response to the infection. By analyzing scRNAseq data from 66 mild and 48 moderate COVID19 patients, we examined the CD8+ T cell activation program in these patients and asked how it changes during the mildtomoderate disease severity transition. The relatively large sample size of this dataset also allowed us to partition samples into nonoverlapping subsets and systematically benchmark different methods’ ability to detect XDE genes by evaluating the detection consistency between different sample subsets.
First, we constructed a pseudotemporal trajectory using a total of 55,953 naive and CD8+ T cells identified from the harmonized PBMC scRNAseq data (Fig. 5a, Methods). The trajectory contains only one path without branch, thus we skip evaluating the tree branch uncertainty and differential topology. TCD analysis shows statistically significant changes in cell density along the trajectory (Fig. 5d, p < 2.22 × 10^{−308}). It is unclear whether the cell density change here was due to technical sampling bias or has any biological meaning, but the density change was not correlated with cell cycle (Fig. S7). Applying TDE test, Lamian identified 2195 TDE genes which were grouped into five clusters (Fig. 5b). Examination of these genes’ dynamic expression patterns show that the inferred pseudotemporal trajectory reflects the CD8+ T cell activation process. For example, known naive/memory T cell associated genes including TCF7, SELL and IL7R were found in cluster 1 (Fig. 5b, c). Genes in this cluster showed decreasing expression along pseudotime, consistent with the loss of quiescent characteristics over the activation process. Genes such as JUNB and CD7 are responsible in the induction of differentiation into effectors and thus catch up expression shortly in cluster 2. Genes in cluster 2 also include early activation marker CD69, GZMK and AP1 family members (e.g. JUNB, JUN), suggesting that this cluster plays a role in the cell fate switch from effector memory T cells to terminal effector T cell phase. By contrast, genes in clusters 4 and 5 both show increasing expression along pseudotime, with cluster 5 reaching its peak expression later than cluster 4. We found that genes encoding functional effector molecules such as CCL5 and IFNG are enriched in cluster 4, and cluster 5 is enriched in both functional activation features such as GZMB, TBX21 and CX3CR1 and terminal differentiation gene features such as GNLY, CD244 and CD38 (Fig. 5c).
We next investigated differences in the CD8+ T cell activation program between mild and moderate patients. The analysis of cell density using XCD test shows that the abundance of activated effector T cells is significantly increased in moderate compared to mild disease (p = 1.38 × 10^{−61}, Fig. 5d). The analysis of gene expression using XDE test identifies 1315 XDE genes, which were grouped into 14 clusters (Fig. 5e). The first 12 clusters contain genes with pseudotemporal trend differences (including bothSig and trendSig), and their trend differences follow 6 major patterns (Fig. 5f, e.g. cluster 2a and 2b have the same trend difference pattern, but cluster 2a has no significant mean shift whereas cluster 2b has significant mean shift). The last 2 clusters contain genes with mean shift only. In cluster 1, TBET (TBX21) and ZEB2 are major transcription factors (TF) for CD8 T cell effector responses^{44,45,46,47} and drive IFNG production. Genes in this cluster tend to have lower expression in moderate patients compared to mild patients and the magnitude of difference increases along the pseudotime (Fig. 5e, f), suggesting that mild patients have a more robust functional effector CD8 T cell response. In cluster 6 (incl. 6a and 6b), several interferon stimulated genes such as IFI6 and ISG15 as well as terminal differentiation transcription factor BLIMP1 (encoded by PRDM1)^{48} become increasingly more upregulated in moderate patients compared to mild patients along pseudotime, suggesting that a stronger inflammation in moderate patients drives CD8 T cell termination. Together, these data indicate that compared to mild disease, CD8 T cells in moderate COVID19 patients are programmed to be less functional effectorlike and more terminally differentiated. This is consistent with previous observation that comparing to the COVID19recovered donors, ongoing disease patients show a more TEMRA differentiation with less Tbet+ functional effector CD8 T cells^{49}.
We further compared Lamian with the other XDE detection methods. We first randomly partitioned the COVID samples into two sets and detected XDE genes between mild and moderate samples within each set. We then examined the proportion of overlap between the two XDE gene lists. By applying Lamian.pm and Lamian.chisq, we achieved the highest overlap proportion between the two partitioned data sets. Phenopath failed to run on this data within one week and with 400GB memory. Among the remaining methods, condiments performed slightly better than the other methods, followed by monocle2TrajTestCorr, limma, and tradeSeq, but all methods performed worse than Lamian (Fig. 5g). This suggests that XDE genes identified by Lamian are most reproducible when analyzing different sets of samples. A closer examination of the samplelevel pseudotemporal curves shows that XDE genes detected by the other methods contain a large number of false positives. Take condiments, the top performer in the remaining methods, as an example. Condiments reported 3809 genes, including 2622 that were not detected by Lamian. The samplelevel curves show that many of such genes did not show clear group differences after accounting for samplelevel variability (Fig. 5i). Lamian reported 1315 XDE genes, including 128 that were solely detected by Lamian. For these genes, group differences cannot be explained only by the samplelevel variation (Fig. 5j).
Collectively, our analyses demonstrate that Lamian provides a powerful tool for identifying differences associated with covariates that the other methods do not offer. The COVID analysis also demonstrates how one can use multisample differential pseudotime analysis to understand dynamic gene expression programs in a disease.
Lamian analysis of tuberculosis data demonstrates efficiency in handling large datasets while adjusting for batch effects
To demonstrate and evaluate Lamian ’s ability to analyze large datasets and detect differences associated with sample covariates while adjusting for potential confounders such as batch effects, we analyzed an atlassize dataset consisting of 337,191 memory T cells from 184 donors (100 females and 84 males) in a tuberculosis (TB) progression cohort^{50} (Fig. 6a, b). This dataset has recently been used for demonstrating covarying neighborhood analysis and biologically meaningful cell abundance differences between males and females were reported along the second principal component of the covarying neighborhood abundance matrix (NAMPC2)^{51}. Consistent with that study, we provided NAMPC2 as cells’ pseudotime and conducted differential analysis (Fig. 6a). Samples in this dataset are profiled in multiple batches (Fig. 6b). We added batch indicators to the Lamian regression model to account for batch effects.
TCD analysis shows that the cell density changed significantly along the trajectory (Fig. 6c), but the density change was not correlated with cell cycle (Fig. S7). Like previous examples, while the cell density change here could reflect real biology, we cannot rule out the possibility that it is due to technical sampling bias. TDE analysis shows that genes with expression elevated in the middle range of pseudotime (cluster 2) are enriched in regulation of T cell activation, and the genes with strong upregulation in the later stage of pseudotime (clusters 5) are enriched in gene ontology terms such as “immune response  activating cell surface receptor signaling pathway”, suggesting that the pseudotime reflects a T cell activation process (Fig. S16a, b). Consistent with this, typical effector transcription factors, such as ZEB2, TBX21, as well as other effector genes such as GZM family members and PRF1, all show a clear increasing pattern over the pseudotime (Fig. S16a).
Consistent with previous report^{51}, XCD test revealed significant cell abundance changes between males and females along the pseudotime. T cells from females were more enriched towards naive status (early pseudotime) compared to T cells from males. By contrast, male cells were more enriched towards terminal activation status (late pseudotime) (Fig. 6c).
XDE analysis between male and female identified 1120 sexassociated differential genes grouped into 14 clusters (Fig. 6d, S16c). Among them, 12 clusters had trend differences or both trend differences and mean shifts. The trend differences of these 12 clusters can be further grouped into 6 patterns. For example, pattern 4 (clusters 4a, 4b) were more highly expressed in males than females and their difference has an overall increasing and then decreasing trend (Fig. S16c). These genes were enriched in proteins targeting ER and membrane (Fig. S16d). By contrast, genes in cluster 1 were more highly expressed in females compared to males along pseudotime, and the absolute difference between female and male first increased and then decreased (Fig. S16c). These genes were enriched in gene ontology terms including lymphocyte activation, leukocyte activation and other immuneactivationrelated features (Fig. S16d), suggesting that T cells from the female group have a stronger T cell response to the disease. The data also suggest that the induction of these genes happens earlier in females along the T cell activation pseudotime. Furthermore, key T cell activation transcription factors, such as ID2 and STAT5B, were involved in this activation process, along with other functional effector molecules such as GZMA, CCL5 and GZMK (Fig. 6d). These molecularlevel discoveries are consistent with the phenotype of female patients infected with TB having a higher TH1 response feature compared to male patients^{52}. On the other hand, the increased abundance of naive T cells in females compared to males (Fig. 6c) could potentially provide a compensating mechanism to control the total amount of functional immune activation in vivo, via reducing the number of responding cells in females when the per cell effector function is high.
To compare with the other XDE methods, We randomly partitioned the samples into two sets, detected XDE genes between female and male samples within each set, and examined the proportion of overlap between the two XDE gene lists. Phenopath, tradeSeq and condiments failed to run on this atlassize data. Among the remaining methods, XDE gene rankings produced by Lamian.pm were most reproducible, followed by Lamian.chisq and monocle2TrajTestCorr (Fig. 6e, f). XDE genes reported by Lamian and limma showed significant overlap with the sex chromosome (X and Y) genes and largest overlap with the genes escaping Xchromosome inactivation, whereas monocle2TrajTestCorr did not (Figs. 6g, S17).
Finally, we compared Lamian results with and without adjusting for batch effects by examining the consistency of XDE genes between separate partitions of samples. Adjusting for batch effects improved the analysis, yielding more reproducible XDE genes (Fig. 6h).
Computational efficiency
Lamian is computationally tractable. For analyzing the HCA bone marrow dataset with 32,819 cells and 8 samples, Lamian.pm took 4.2 hours to run the whole pipeline (0.1h for trajectory variability, 2.7 h for XDE detection and 1.4h for TDE detection, 0.01h for cell density test) on a computer cluster with 25 CPUs (2.5 GHz CPU and at most 163 GB RAM combining all CPUs). Lamian.chisq is more efficient and only took 0.5 h and 6.2 GB RAM. For analyzing 39,512 CD8 T cells in the COVID dataset with 114 samples, Lamian.pm and Lamian.chisq took 37 and 2.9 hours and 285 and 5 GB RAM, respectively, to run the whole analysis pipeline. For atlassize data (more than 10^{5} cells), Lamian uses HDF5 file format to store and analyze the data to increase the computational efficiency. For the TB dataset with 337,191 cells from 184 samples, Lamian.pm can finish the analysis with 114.1 h and 243 GB memory, and Lamian.chisq took 15.8 h and 5.1 GB memory. As the TB analysis only involved Lamian modules 3 and 4, we additionally benchmarked modules 1 and 2 using synthetic data and found that they are also scalable to atlas level data (Fig. 18a, b). XDE analysis is the most timeconsuming and memoryintensive component of the whole analysis. For this component, Fig. S18c, d and Table S2 further compare the computation time and memory of different methods on different datasets. Lamian.chisq and monocle2TrajTestCorr are the fastest and Lamian.chisq requires least memory. Lamian.pm is slower but it is capable of handling atlaslevel data. Unlike Lamian, PhenoPath, condiments and tradeSeq are not scalable to large datasets. condiments and tradeSeq failed to handle the TB data, and PhenoPath failed to handle both COVID and TB data within one week and with 400GB memory.
Discussion
In summary, Lamian provides a systematic solution to multisample pseudotime analysis capable of detecting topology, gene expression and cell density differences between different conditions. In biomedical research, while making new discoveries is exciting, ensuring that the discoveries are real and replicable is equally important. One challenge the scientific community faces today is that many findings cannot be replicated or validated in independent studies^{53}. One important contributor to this problem is the flawed statistical analyses which can produce a large number of false discoveries. Such irreplicable false discoveries can be detrimental by distracting investigators from real signals and misleading subsequent research efforts, resulting in substantial waste of precious human and financial resources. In the context of pseudotime analysis, our results demonstrate that, due to lack of appropriate consideration of crosssample variability, existing pseudotime methods can report thousands of false differential genes in null simulations where the data do not contain any true differential signals. This highlights a critical gap in the pseudotime literature and an open challenge that needs to be addressed. Lamian fills in this gap by introducing a comprehensive statistical framework, including a functional mixed effects model, to account for crosssample variability in the multisample differential pseudotime analysis. In order to benchmark this method, we applied it to both simulated and real data. We note that the analyses of three real datasets (HCABM, COVID, TB) mainly serve the purpose of illustrating and evaluating Lamian, and that making new biological discoveries per se is not the focus of this study. The orthogonal information (e.g. sex chromosome genes) and large sample size (e.g. COVID and TB data) available in these data make it possible to objectively and robustly compare different methods by quantifying the overlap with orthogonal information or between independent data partitions. Our results in these simulated and real data show that the solution provided by Lamian substantially outperforms other existing methods to prioritize true discoveries and filter out false discoveries that are not generalizable to new samples.
Lamian is a free and open source R package with a modular structure. While we demonstrated its default pipeline in this article, users can replace certain analysis modules by their own data or algorithm. For sample harmonization, we used Seurat^{32} to embed cells into a common lowdimensional space. One could also use other methods such as Harmony^{33} and scVI^{34}. For example, in our HCA bone marrow analysis, using Seurat, Harmony and scVI produced similar branching structure and differential genes (Figs. S19, S20). In real applications, different harmonization methods may perform differently. We recommend users to compare different harmonization methods and choose the one most consistent with the existing knowledge. A systematic comparison of harmonization methods is beyond the scope of this study. Readers are referred to a recent benchmark study^{35} for discussions on which harmonization methods to use under different conditions.
In Lamian, TSCAN is used as the default method to construct pseudotime due to its flexibility and scalability. TSCAN uses the clusterbased MST approach to reduce the number of tree nodes (e.g. clustering 1 million cells into 1000 clusters will result in only 1000 tree nodes instead of 1 million tree nodes) and hence can handle a large number of cells. In terms of flexibility, while TSCAN by default determines the number of cell clusters automatically via an elbow method, users have the option to specify their own cluster number if they are not satisfied with the default cluster number. Increasing the cluster number may create a more complex tree with a more detailed view of the biological process. However, the increased complexity could also introduce noise and false branches (Fig. S3). In real applications, even though one may construct a more complex tree, a key question is whether one can trust that the tree structure is real rather than random noise. Answering this question is challenging when there is little or no prior knowledge about the underlying biological process. Lamian addresses this issue via bootstrap and detection rate. A low confidence tree branch can be reflected by its low detection rate. Based on our experience, applying this criterion often leads to relatively simple tree structure. This does not imply that the underlying tree structure is necessarily simple. Instead, it only reflects the fact that the available data can only provide enough information to support robust conclusion on a relatively simple tree and there is not enough information to draw conclusions on a more complex tree structure. If users have prior knowledge that supports a more complex tree structure, they can use the manual option provided by TSCAN to choose a larger cluster number to define more detailed tree structure. In addition to cluster number, TSCAN also allows users to manually specify the order of cell clusters in the trajectory, providing another way to adjust the trajectory based on users’ prior knowledge. These options in TSCAN allow users to conveniently perform analysis on more complex tree structures. Once the tree topology is given, all the remaining analyses including those in modules 2 to 4 (differential topology, XDE, TDE, XCD, TCD) can be carried out as usual.
Besides TSCAN, one also has an option to use userprovided pseudotemporal trajectories as illustrated in the TB analysis. In fact, one may use Lamian modules 3 and 4 as downstream analysis tools for other pseudotime methods such as Monocle2, Monocle3 and slingshot. However, Lamian modules 1 and 2 which construct trajectory and quantify its uncertainty and variation currently do not support other pseudotime methods due to various issues including scalability and implementation challenges (Supplementary Notes). For example, slingshot, a popular MST method similar to TSCAN, does not scale well to bootstrap due to its timeconsuming principal curve fitting. For Monocle2 and Monocle3, modifying Lamian ’s trajectory topology uncertainty module to support them is nontrivial due to lack of interoperability between the data structures used by different methods to represent trajectory topology. A future direction is to tailor these methods to improve their scalability and/or interoperability to allow their seamless connection to Lamian modules 1 and 2.
Uncertainties in the pseudotime analysis include both the uncertainty of the inferred pseudotemporal trajectory and the uncertainty of gene expression or cell abundance conditional on pseudotime. In Lamian, the trajectory inference uncertainties are characterized by bootstraping cells to compute detection rates. Conceptually, one could also account for the pseudotime reconstruction uncertainty in the downstream differential gene expression and cell abundance analysis by fitting the temporal gene expression and cell abundance curves for each bootstrapped tree. However, practically, it will make the differential analysis difficult to implement and make the results difficult to summarize and report. This is because trees reconstructed from different bootstrap samples can have different topologies due to the randomness. A branch that appears in one tree may not exist in another tree, and often it is unclear how one should align branches of different trees. It is unrealistic to enumerate all branches that occurred in bootstrapped trees, and the meaning of differential expression along a branch can be unclear if the branch does not always exist. For this reason, Lamian separated the evaluation of uncertainties of the inferred pseudotemporal trajectory (i.e. the construction of minimum spanning tree) and the evaluation of uncertainties of gene expression using a sequential “conditional" procedure. In other words, our module 1 evaluates the uncertainty of pseudotime (MST) construction. Next, conditional on a tree lineage and conditional on the corresponding inferred pseudotime, modules 3 and 4 perform differential analyses using bootstrap sampling to account for the celllevel uncertainty, followed by modelling sample and celllevel variability to account for gene expression variability and uncertainty. This sequential procedure avoids the complication of comparing different trees, making it easier for summarizing the analysis results to end users. Thus, while it may be imperfect, it provides a practical solution to this complicated problem. Developing better methods that can simultaneously account for all sources of uncertainties including pseudotime inference uncertainty, gene expression and cell abundance inference uncertainty, and crosssample and crosscell variability remains a future research topic that warrants further investigation.
Currently, the statistical model in Lamian is formulated for scRNAseq data. However, its general principle and statistical framework may be applicable to other data types such as singlecell ATACseq data as well, although the other data types may have different data characteristics that require one to tailor the model accordingly. These extensions will be a topic for future research.
Methods
Data
Human Cell Atlas bone marrow dataset (HCABM)
The raw count matrix of bone marrow scRNAseq data sequenced in 10x Genomics platform from 8 healthy donors were downloaded from the Human Cell Atlas (HCA) data portal^{37,38} (immune cell atlas of human hematopoietic system). The raw data consist of 42,925 genes and 290,861 cells. Cells with fewer than 5000 reads, fewer than 1,000 expressed genes (i.e. genes with nonzero read count), or more than 10% of reads mapped to the mitochondrial genome were deemed as low quality and filtered out. We also filtered out genes that were expressed in less than 0.1% of all cells. This results in a data matrix of 22,401 genes × 32,819 cells used for subsequent analyses. See Supplementary Notes and Fig. S21 for a more detailed discussion of filtering parameters and additional quality control (QC) plots.
COVID19 dataset (COVIDSu)
The raw count matrices of 256 PBMC 10x Genomics scRNAseq samples from 139 COVID19 patients were downloaded from EMTAB9357^{43}. We filtered out cells with fewer than 2,000 reads or 500 expressed genes or more than 10% mitochondrial reads. We also filtered out samples with fewer than 500 cells. Seurat(v.3.2.1)^{32} was applied to process, integrate data across samples and perform the cellular clustering with default settings. Cell types were annotated based on known marker genes. CD8+ T cells were identified using CD3D expression > 1 logscaled librarysizenormalized SAVERimputed read counts and CD8A expression > 1 criterion. Samples with fewer than 100 CD8+ T cells were filtered out. Among the total of 161 samples that passed the filters, we focused on analyzing samples from 66 mild and 48 moderate patients subsequently. This results in a data matrix of 26, 701 genes × 55, 953 cells used for subsequent analyses.
Tuberculosis (TB) dataset
We obtained the preprocessed gene expression file GSE158769_exprs_raw.tsv.gz of the TBRU dataset directly from GSE158769^{50}. This dataset consists of 500,089 memory T cells from 259 donors that were profiled with CITEseq. NAMPC1 and NAMPC2 coordinates of 393,998 cells were obtained from the authors of CNA^{51} (also see its supplementary file of Fig. 5A^{51}). Batches with at least one male sample and at least one female sample were retained. Samples with at least 1000 cells and genes with expression values > 0.1 in at least 1% of cells were retained. The NAMPC2 values were used to order cells to produce pseudotime. These processing steps result in a data matrix of 9317 genes × 337,191 cells (184 samples from 38 batches, with 100 female and 84 male samples) which was used for subsequent analyses.
Data harmonization and preprocessing
Before Lamian analysis, one needs to first harmonize data from different samples. The purpose of harmonization is to match cells of the same type across samples so that the same type of cells can be compared across samples. As such, it removes both biological differences of interest (e.g. the same cell type can have differential expression between two sample conditions which is removed by harmonization) and unwanted technical differences (e.g. batch effects) among samples. In downstream analyses, since one is interested in biological variation across samples and conditions, Lamian will use the original normalized gene expression values instead of the harmonizationcorrected expression values, and it will use a regression framework to remove unwanted technical variations such as batch effects but retain biological differences across samples. In this study, we used Seurat(v.3.2.1)^{32} to integrate (or harmonize) multiple samples in each dataset. For differential expression (DE) analysis, SAVER was used to impute gene expression values to address the dropouts in the data. All DE methods used imputed values except tradeSeq and condiments since they require count values as inputs. Principal Component Analysis (PCA) and Uniform Manifold Approximation and Projection (UMAP)^{54} were used for visualization, and they were both run using default settings.
Constructing pseudotemporal trajectory and evaluating its uncertainty
In the default mode of Lamian, after samples are integrated, the harmonized data are used to construct pseudotemporal trajectory using a clusterbased minimum spanning tree (cMST) approach. Kmeans clustering is applied to cluster cells based on the top principal components (PCs) of log2transformed librarysizenormalized gene expression profiles. Trajectories are then inferred as in TSCAN by constructing a minimum spanning tree that treats cluster centers as nodes. The number of PCs and the cell cluster number are both determined using an elbow method^{23}. The origin of the pseudotime is specified by users based on marker gene expression (or the origin cell types if users input the cell types annotation). For example, in the bone marrow data, the cluster with the highest expression of hematopoietic stem cell (HSC) marker CD34 was set as the origin. Once the origin of the trajectory is given, one can enumerate all paths and branches. Branches are identified based on nodes with degree > 2.
For each of the branch, we characterize its uncertainty using its detection rate in 1000 bootstrap samples. Each bootstrap sample is created by sampling cells from the original data with replacement. Cells in the bootstrap sample are used to reconstruct pseudotemporal trajectory using the same cMST approach as in the original data. The origin of the pseudotime in a bootstrap sample is determined using the cell cluster with the smallest mean of cells’ pseudotime in the original data. We then ask whether each branch in the original data is also identified in the bootstrap sample by performing pairwise comparison of branches between the original and bootstrap data. For a pair of branches (one from original data and one from bootstrap sample), we use the Jaccard index to evaluate their overlap (i.e., what percentage of cells in these two branches are shared). If the Jaccard index exceeds a cutoff, then the branch in the original data is called detected in the bootstrap sample. To determine the cutoff, a null distribution of Jaccard index is constructed by evaluating the overlap between the cells in the branch and a randomly sampled set of cells with the cell number matching those in the branch for 1000 times. The 0.99 quantile of this null distribution is used as the cutoff. After comparing the original trajectory with all bootstrap samples, the detection rate of a branch is defined as the proportion of bootstrap samples in which the original branch can be detected.
Tree variability across samples and differential topology analysis
For each sample, the proportion of cells in each branch is calculated and referred to as “branch cell proportion”. For each branch, the variance of branch cell proportion across samples is reported to characterize its crosssample variability.
To test differential topology, by default a binomial logistic regression model is fitted for each branch. Here the branch cell count is treated as the dependent variable and modeled using binomial distribution Binomial(n, p) where n is the total cell count in a sample and p is the underlying true branch cell proportion. The regression models \(\log (p/(1p))\) as a function of the sample covariates which are specified by users as the independent variables. Statistical significance of the association between a sample covariate and the branch cell proportion is determined by testing whether the corresponding regression coefficient is zero using Wald test. The pvalues are adjusted for multiple testing using the BenjaminiHochberg procedure to obtain false discovery rates (FDRs)^{31}. By default, FDR≤0.05 is used as the significance cutoff. As an example, if two conditions have different topologies and each has a conditionspecific branch, then after data integration and trajectory construction, one will have a branch (branch A) that only contains cells from condition 1, and another branch (branch B) that only contains cells from condition 2. The differential topology test will test the cell proportion differences between the two conditions for each branch. For branch A, it will report that there is a significant difference in cell abundance between condition 1 and condition 2, and it will also report the mean cell proportion in that branch for each condition. Users will be able to see that the proportion of cells in branch A in each sample from condition 2 is almost zero, but the cell proportion in branch A for condition 1 is above zero. Therefore, based on this information one will know that branch A is likely condition1specific. Similarly, one can tell that branch B is condition2specific since the cell proportion on that branch is almost zero for condition 1 and is positive for condition 2, and the difference between the two conditions is significant.
Optionally, users can also fit a multinomial logistic regression by considering all branches jointly. Assume there are L branches, and let p_{1}, …, p_{L} be the underlying true branch cell proportions for these branches in a given sample (\(\mathop{\sum }\nolimits_{l=1}^{L}{p}_{l}=1\)). In the multinomial logistic regression, one chooses a branch as the reference branch. By default, Lamian chooses the most abundant branch (i.e. the branch with the largest number of cells) as the reference branch. Without loss of generality, let L denote the reference branch. The model assumes that the branch cell counts in a sample follow a multinomial distribution Multinomial(n, (p_{1}, …, p_{L})) where n is the total cell count of the sample. It models \(\log ({p}_{l}/{p}_{L})\) (l = 1, …, L − 1) as functions of sample covariates. Statistical significance of the association between a sample covariate and log odds is determined by testing whether the corresponding regression coefficients are zero, similar to binomial logisitic regression. Compared to fitting a binomial logistic regression for each branch, multinomial logistic regression allows one to account for the fact that cell abundance in different branches are not independent. The binomial logistic regression, on the other hand, may allow one to conveniently explore whether branch cell proportion of a given branch increases or decreases (Supplementary Notes).
Modeling gene expression along pseudotime
Given a pseudotemporal path or branch, Lamian will describe how gene expression Y varies along pseudotime t and characterize the relationship between each gene’s pseudotemporal expression pattern Y(t) and V sample covariates X_{1}, …, X_{V} (e.g. disease status, age, etc.) using a functional mixed effects model.
Without loss of generality, below we presents the statistical model for one gene. All other genes can be analyzed in the same way. We use lowercase letters s and c to denote sample and cell, respectively, and we use capital letter S to denote the total number of samples. Assume that sample s consists of C_{s} cells. Let t_{sc} be the pseudotime of cell c in sample s. Given a gene, let y_{sc} denote its expression level in cell c of sample s. Let \({{{{{{{{\bf{x}}}}}}}}}_{s}={(1,{x}_{s1},\ldots,{x}_{sV})}^{T}\) be the realized values of covariates in sample s. Here, we introduced an additional term x_{s0} ≡ 1 as an intercept term for the subsequent regression model.
We model each gene’s expression pattern along pseudotime as functional curves and represent the function using a total of K + 1 Bspline basis functions ϕ_{0}(t), ϕ_{1}(t), …, ϕ_{K}(t). Here K is the number of equidistant knots used to define Bspline bases. The gene’s functional curve in sample s is \({Y}_{s}(t)=\mathop{\sum }\nolimits_{k=0}^{K}{\phi }_{k}(t){b}_{sk}\). For each gene, the optimal K is automatically chosen by comparing values ranging from 0 to a predefined maximum (20 by default) and selecting the one that minimizes the Bayesian Information Criterion (BIC). The BIC for a given K is calculated as \({{{{{{{{\rm{BIC}}}}}}}}}_{K}=KS\ln ({\sum }_{s}{C}_{s})2{\sum }_{s}{l}_{K,s}+const\). Here const is a constant term that does not depend on K (hence irrelevant for finding optimal K), and l_{K,s} is the loglikelihood of the Bspline regression for sample s (i.e. we fit a linear regression where the response variable is the gene expression in cells and the independent variables are the K + 1 Bspline bases).
The observed data of the gene are assumed to be generated from this unobserved function after adding celllevel random noise ϵ_{sc} as follows:
where
Since all samples share the same Bspline bases ϕ(t), the samplespecific temporal pattern is described via the samplespecific regression coefficients b_{s}. To model the relationship between a gene’s pseudotemporal pattern Y_{s}(t) and sample covariates x_{s} while accounting for sampletosample variability that cannot be explained by the covariates, we further assume
where B is a (K + 1) × (V + 1) matrix representing unknown fixed effects of covariates, and u_{s} is a (K + 1) × 1 vector representing unobserved samplelevel random effects (i.e. random variations among samples with the same covariate values):
Here Ω is a (K + 1) × (K + 1) positive definite matrix. Note that the degrees of freedom for estimating samplelevel covariance matrix Ω after accounting for V + 1 covariates are S − (V + 1) and one needs at least K + 1 degrees of freedom to estimate a full rank covariance matrix with dimension K + 1. Therefore, if the sample size S does not exceed V + K + 2, we do not have enough information to estimate an unconstrained Ω. In that scenario, we add a constraint by assuming Ω = ω^{2}I_{(K+1)×(K+1)} where I represents an identity matrix. This constraint reduces the number of parameters in Ω to 1. Define
\({{{{{{{{\boldsymbol{\beta }}}}}}}}}_{k.}^{T}\) is the kth row of B, corresponding to regression coefficients for basis ϕ_{k}(t). β._{v} is the v^{th} column of B, corresponding to regression coefficients for the vth covariate X_{v}. If gene g’s expression pattern does not depend on X_{v}, then β._{v} = 0.
To facilitate developing the model fitting algorithm, Eq. (3) can also be rewritten in a vectorized form. Let I_{K} be a K × K identity matrix, and
Then Eq. (3) can also be written as:
Thus, the observed data model in Equation (1) is equal to
where \({\epsilon }_{sc} \sim N(0,\, {\sigma }_{s}^{2})\) and \({{{{{{{{\bf{u}}}}}}}}}_{s} \sim N({{{{{{{\bf{0}}}}}}}},\, {\sigma }_{s}^{2}{{{{{{{\boldsymbol{\Omega }}}}}}}})\). We further assume that \({\sigma }_{s}^{2}\) follows an inverseGamma distribution:
For the given gene, let \({{{{{{{{\bf{y}}}}}}}}}_{s}={[{y}_{s1},\ldots,{y}_{s{C}_{s}}]}^{T}\) denote its expression in all cells in sample s, \({{{{{{{{\boldsymbol{\epsilon }}}}}}}}}_{s}={[{\epsilon }_{s1},\ldots,\, {\epsilon }_{s{C}_{s}}]}^{T}\), and \({{{{{{{{\boldsymbol{\Phi }}}}}}}}}_{s}={[{{{{{{{\boldsymbol{\phi }}}}}}}}({t}_{s1}),\ldots,\, {{{{{{{\boldsymbol{\phi }}}}}}}}({t}_{s{C}_{s}})]}^{T}\), then Eq. (7) can also be written in a matrix form as:
The above model can be fit using an ExpectationMaximization (EM) algorithm (see details in the Supplementary Notes). The algorithm can estimate the unknown parameters \({{{{{{{\boldsymbol{\Theta }}}}}}}}=\left\{{{{{{{{\boldsymbol{\beta }}}}}}}},\, {{{{{{{\boldsymbol{\Omega }}}}}}}},\, \alpha,\, \eta \right\}\) and infer \({\sigma }_{s}^{2}\) based on the observed data. Here \({\sigma }_{s}^{2},\, \alpha,\, \eta \in {\mathbb{R}}\), \({{{{{{{\boldsymbol{\Omega }}}}}}}}\in {{\mathbb{R}}}^{(K+1)\times (K+1)}\), \({{{{{{{\boldsymbol{\beta }}}}}}}}\in {{\mathbb{R}}}^{(K+1)(V+1)}\).
Detecting differential expression associated with sample covariate (XDE)
Under the Lamian model, detecting differential expression associated with a sample covariate X_{v} amounts to testing whether \({{{{{{{{\boldsymbol{\beta }}}}}}}}}_{.v}={[{\beta }_{0v},\, {\beta }_{1v},\ldots,\, {\beta }_{Kv}]}^{T}={{{{{{{\bf{0}}}}}}}}\). An XDE gene is a gene with β._{v}≠ 0. For an XDE gene, if β_{0v} = β_{1v} = … = β_{Kv} = c (i.e. all β_{kv}s are equal), then the effect of the covariate is to shift the gene’s pseudotemporal curve up or down by a constant c for every unit change in X_{v} (because the Bspline bases satisfy \(\mathop{\sum }\nolimits_{k=0}^{K}{\phi }_{k}(t)=1\)). Such a gene is called XDE with mean shift only. If β_{kv}s are not all equal for an XDE gene, then the covariate also changes the trend of the gene’s pseudotemproal curve. To systematically detect and classify XDE genes, we consider the following nested models:

M_{0}: \({{{{{{{{\boldsymbol{\beta }}}}}}}}}_{.v}={[\, {\beta }_{0v},\, {\beta }_{1v},\ldots,\, {\beta }_{Kv}]}^{T}={{{{{{{\bf{0}}}}}}}}\).

M_{1}: β_{.v} ≠ 0 and β_{0v} = β_{1v} = … = β_{Kv} = c.

M_{2}: β_{.v} ≠ 0.
We conduct the following hypothesis tests:

Overall XDE test: the null model M_{0} is compared with the alternative model M_{2}. Rejecting M_{0} implies XDE.

Mean test: M_{0} and M_{1} are compared. Rejecting M_{0} implies mean shift.

Trend test: M_{1} and M_{2} are compared. Rejecting M_{1} implies trend difference.
A gene is called XDE if the XDE test is significant. For an XDE gene, if the mean test is significant but the trend test is not significant, the gene is called XDE with mean shift only. If the trend test is significant but the mean test is not, then the XDE gene is called XDE with trend difference only. If both the mean test and the trend tests are significant, then the XDE gene is called XDE with both mean shift and trend difference.
To conduct a hypothesis test comparing two models, we use a permutationbased likelihood ratio test. Without loss of generality, consider comparing null model M_{0} versus alternative model M_{1} as an example (other model comparisons are handled similarly). The test statistic is the loglikelihood ratio (LLR) between M_{1} and M_{0} computed using the observed data. To construct the null distribution of the test statistics, we use a permutation approach. In each permutation, we first bootstrap the cells (keeping cell number the same as the observed data) to account for the pseudotime variability, and then we permute the values of the covariate X_{v} among the samples. Using the permuted data, the models are refit and the LLR statistic is recomputed. Using the LLR obtained from all permutations (by default, 100 times), an empirical distribution is fitted using kernel density estimate (base::density()) to serve as the null distribution. The pvalue is calculated as the tail probability of the null distribution (i.e. probability that a LLR drawn from the null distribution is equal or larger than the observed LLR). The pvalues from all genes are adjusted for multiple testing using the BenjaminiHochberg procedure to obtain false discovery rates (FDRs)^{31}. By default, FDR ≤ 0.05 is used as the significance cutoff.
Besides permutation test, we also provide an option to compute pvalues and FDR based on the asymptotic null distribution, that is, chisquared distribution, of the likelihoodratio test (stat::pchisq()). The degree of freedom is the difference in the number of parameters between the full and null model. This option can be used if users need computational efficiency and are willing to sacrifice some accuracy to control FDR.
Adjusting for confounding variables such as batch effects
Since Lamian uses a general regression framework, one can adjust for confounding variables such as batch effects by properly specifying the design matrix. The design matrix x can contain multiple columns corresponding to multiple sample covariates. For example, given eight samples (4 males and 4 females) sequenced in three batches, the design matrix for XDE can be specified as
Here each row corresponds a sample. The first column is the intercept. The second column represents samples’ sex (1 for female, 0 for male). The third and fourth columns are dummy variables to indicate batches. Suppose one is interested in detecting XDE genes associated with sex, one will only use the regression coefficients for the sex variable to identify differential genes. The batch effects are accounted for by columns 3 and 4.
Detecting differential expression along pseudotime (TDE)
Unlike Lamian, most existing pseudotime methods do not detect differential expression associated with covariates (XDE). Instead, they detect differential expression along pseudotime (TDE). While our main focus is to detect XDE genes, Lamian also provides a function to detect TDE genes.
When all samples are from one group without covariate, the Equation (3) becomes
Note that \(\mathop{\sum }\nolimits_{k=0}^{K}{\phi }_{k}(t)=1\). Thus, if β_{00} = β_{10} = … = β_{K0} = c (i.e. all β_{k0}s are equal), then the pseudotemporal pattern shared by samples is ϕ(t)^{T}β_{.0} = c, which is a constant that does not change along pseudotime. Therefore, TDE detection can be formulated as comparing the following two models:

H_{0}: β_{k0} (k = 0, 1, …, K) are all equal

H_{1}: β_{k0} (k = 0, 1, …, K) are not necessarily all equal
This yields the following hypothesis test:

TDE test: H_{0} and H_{1} are compared. Rejecting H_{0} implies differential expression along pseudotime (TDE).
The TDE test can also be generalized to account for sample covariates. With covariates, the compared models become:

H_{0}: β_{kv} (k = 0, 1, …, K) within each column of B in Equation (3) are equal (i.e. β._{v} = c_{v}1 where v = 0, 1, …, V and 1 represents a K + 1 vector with all elements equal to 1)

H_{1}: No constraint on B
The hypothesis test is conducted using a permutationbased likelihood ratio test. We first compute the loglikelihood ratio (LLR) between H_{1} and H_{0} as the test statistic using observed data. We then construct the null distribution of LLR using permutations. In each permutation, we first bootstrap the cells to account for pseudotime variability, and we then permute the pseudotime of the cells within each sample. Using the permuted data, the models are refit and the LLR statistic is recomputed. The null distribution is derived by applying the kernal density estimate (base::density()) to the empirical LLR statistics obtained from all permutations (by default, for 100 times). Pvalue is calculated as the tail probability of the empirical distribution. The pvalues from all genes are adjusted for multiple testing using the BenjaminiHochberg procedure to obtain FDR^{31}. By default, FDR≤0.05 is used as the significance cutoff.
EM algorithm for fitting the Lamian model
The algorithm used to fit the Lamian model is provided in Supplementary Notes in detail.
Analysis of cell density changes
Given a pseudotemporal path or branch, we divide the pseudotime from 0 to its maximum into 100 consecutive intervals of equal lengths. The number of cells in each interval t and sample s is counted and denoted as r_{st}. One approach to modeling cell density changes is to model r_{st} using a count distribution (e.g. Poisson or Negative binomial) with mean L_{s}λ_{st} where L_{s} is a samplespecific normalizing constant corresponding to the total cell number on the pseudotemporal path. One can then model \(\log {\lambda }_{st}\) as functional curves using Bspline bases similar to the gene expression model. Fitting such a model, however, requires algorithms such as Markov Chain Monte Carlo which makes this approach less appealing computationally. We therefore use an alternative and simpler approach in which r_{st}/L_{s} is modeled in the same way as the gene expression model in equation (1) (i.e. treating time interval t as cell and treating r_{st}/L_{s} in the same way as y_{sc}). In this way, testing if the cell density changes along pseudotime (TCD test) or if a sample covariate changes the pseudotemporal cell density curves (XCD test) can be handled following the same procedure for TDE and XDE tests. This approach is more computationally efficient and yields reasonable results empirically in our benchmark data.
Comparisons with existing methods
XDE detection
For detecting differential expression associated with covariates, we compared Lamian with tradeSeq^{27} (v.1.1.23), limma^{42} (v.3.40.6), monocle2^{21} (v2.14.0), PhenoPath^{28} (v1.8.0), and condiments^{29} (v.0.99.14). We applied tradeSeq by considering the cells belonging to two groups as those belonging to two lineages. The cell weights on each group were set as 0.99 and 0.01 respectively. We then fit the models by running the fitGAM() function with the default setting. All three types of tests for betweenlineage comparisons were included. Specifically, earlyDETest(), diffEndTest() and patternTest() were applied to identify early drivers of differentiation, differentiated markers and expression patterns over pseudotime, respectively. limma was applied by pooling each sample as a pseudobulk. Its functions lmFit(), eBayes(), and topTable() were used to perform the test. Monocle2 provides a trajectoryconditioned test ‘monocle2TrajTest’ which compares a full model g(E(Y)) ~ β_{0} + f_{1}(G) + f_{2}(ϕ) with a null model g(E(Y)) ~ β_{0} + f_{1}(G) (see page 23 of the Monocle2 supplemental material^{21}). Here E(Y) is the expected values of the transcript counts Y, g( ⋅ ) is the \(\log\) function, G indicates sample group (note: in the original Monocle2 paper G refers to genotype), ϕ is pseudotime, f_{1}(G) models the different intercepts for different groups (note: f_{1}(G) does not involve pseudotime ϕ), and f_{2}(ϕ) is a nonparametric function that models gene expression as a function of pseudotime ϕ. The ‘monocle2TrajTest’ evaluates whether f_{2}(ϕ) is zero and it assumes that f_{2}(ϕ) is the same for different sample groups G. Therefore, the trajectoryconditioned test ‘monocle2TrajTest’ is essentially a TDE test and does not detect XDE. To detect XDE, we modified the buildin functions in Monocle2, resulting in a revised test which we call ‘monocle2TrajTestCorr’. In our revised test, the null model is g(E(Y)) ~ β_{0} + f_{2}(ϕ), and the full model is g(E(Y)) ~ β_{0} + f_{1}(G) + f_{2}(ϕ) + f_{2}(ϕ)f_{1}(G). Comparing these two models will test whether f_{1}(G) + f_{2}(ϕ)f_{1}(G) (i.e., the difference between different sample groups including both intercept and pseudotemporal trend differences) is zero. To run PhenoPath, we provided sample covariate information as a design matrix where the first column is the phenotype (1 or 0, indicating the two sample groups), and the second to Sth columns are S − 1 dummy variables to indicate the S samples. Pvalues are 1 − the tail probability of the test statistics’ zscores (m_beta divided by s_beta) which are assumed to be standard normal. condiments was run based on its user manual. The XDE in condiments was implemented using the conditionTest() in the tradeSeq package.
TDE detection
For detecting differential expression along pseudotime, we compared Lamian with Monocle2 (v.2.14.0), Monocle3^{22} (v.3.0.2.1), tradeSeq^{27} (v.1.1.23) and TSCAN^{23} (v.1.7.0). All methods other than Lamian treat cells from all samples as if they were from one sample. Monocle2 performs the testing with an approximate χ^{2} likelihood ratio test. In this test, generalized additive models (GAMs) are applied to fit the gene expression against pseudotime as a full model, while the null model considers the gene expression as a constant along pseudotime. Monocle3 performs trajectory inference on the coordinates from UMAP and then implements the Moran’s I test to identify genes whose expression is associated with pseudotime with statistical significance. TSCAN applies the same fitting and testing method as Monocle2 except that TSCAN uses MGCV package and Monocle2 applies VGAM package. tradeSeq is used by Slingshot^{24} to identify dynamic genes along pseudotime. Both tests designed for withinlineage comparisons in tradeSeq were included (startVsEndTest() and associationTest()). We also tried pseudotimeDE, but it did not output results within one week and with 400GB and 20 CPU cores.
Significance cutoff
All pvalues are provided as exact values except for situations where the pvalue computation reaches the computer’s precision lower bound (i.e. the smallest value allowed by the precision). In that case, pvalues are reported as smaller than the precision lower bound. All pvalues reported by each method were adjusted for multipletesting using the BenjaminiHochberg procedure to obtain false discovery rates (FDRs)^{31}. By default, FDR ≤ 0.05 is used as the significance cutoff.
Simulations
XDE detection
We first created null simulation data where we do not expect any XDE genes. The simulation was based on the 13,269 cells on the erythroid branch in the real HCABM data described above. For the null simulation in Fig. 4a, the eight bone marrow scRNAseqsamples were randomly partitioned into two groups (group 0 and 1). Next, to remove any group differences for a given gene, we divided the pseudotime into 100 nonoverlapping intervals of equal lengths. Within each interval and within each sample group, we calculated the median of the gene’s normalized expression. For cells in the sample group with lower median value, we added their expression with the difference of median expression between the two groups so that the two groups have similar expression values.
Building upon the null dataset above, we then introduced in silico spikein differential signals with varying strengths and pseudotemporal patterns between the two sample groups to a random set of genes. This spikein simulation data set was used in Fig. 4b–h. We randomly selected 20% (1814) genes as the gold standard XDE genes (gs genes) and randomly assigned them to 3 groups: trend difference only, mean shift only, and both trend & mean differences. We then spiked in differential trend, mean, or both trend & mean signals into these gold standard genes based on their differential type. To generate the spikein signals, we selected highly variable genes from the remaining 80% nongoldstandard (nongs) genes using cells in sample group 0 and using their original unpermuted data. To select highly variable genes, we applied Bsplines to fit the relationship between the standard deviation (SD) and the mean of gene expression of the nongoldstandard (nongs) genes across cells in group 0. Genes with positive residuals (i.e. SD is larger than its expected value estimated from the mean expression) are selected as highly variable. We applied kmeans clustering to cluster these genes into 5 clusters using their standardized log_{2}transformed SAVERimputed expression. Louvain and Gaussian mixture model clustering have also been separately applied to examine the sensitivity to clustering methods. Here the cluster number 5 was determined using an elbow method. For each gene that was clustered, we fit a Bspline on the log_{2}transformed SAVERimputed expression against pseudotime. We evaluated the magnitude of change of the gene along pseudotime by calculating a F − statistic that compares a full model (which assumes gene expression along pseudotime is modeled using the Bspline curve plus additive noise) and a null model (which assumes gene expression along pseudotime is a constant plus additive noise). We used highly variable genes (i.e. those with positive residuals) as “source genes”. We ordered source genes in increasing F − statistics. We categorized the tail 1814 source genes into 4 groups from the smallest to the largest F − statistics to represent signal strengths from weakest (1) to highest (4). In each signalstrength simulation, we added the gene expression profiles in each sample from the source genes in the same strength group onto those gold standard genes. The signalspikein procedures were performed in SAVERimputed gene expression matrix and original count matrix in parallel. For gold standard genes with trend difference, we added signals to both group 0 and 1, except that the signals were permuted before adding to group 1. For gold standard genes with mean shift, we permuted the source gene expression profiles within each sample before adding signals to group 0. For gold standard genes with both trend and mean differences, we added source signals directly to group 0 cells without centering the data.
TDE, TCD, and XCD detection
Simulations for evaluating TDE, TCD and XCD detection are presented in Supplementary Notes.
Evaluation
Evaluation in simulation
Performance of Lamian (Lamian.pm), Lamian.chisq and other existing methods is compared based on FDR difference and AUC. FDR difference is the difference between the area under the realFDR vs. reportedFDR curve and the diagonal line. The differences between the true and reported FDRs were calculated for overall XDE test, trend test and mean test. Within each set of goldstandard genes (trend, mean, and trend & mean), the area under sensitivityrealFDR curve (AUC) was also calculated.
Evaluation on sex chromosomes
To evaluate overlap between XDE genes reported by different methods and sex chromosome genes as gold standard (see Fig. S12a), we counted the overlap (i.e. the number of sex chromosome genes) among the top N XDE genes for different Ns. The mean of the overlap across all Ns was used as the observed overlap statistic. Null distribution was constructed by permuting the order of the genes which originally were ordered by increasing FDR. Violin plots show the permutation null distribution used to determine the statistical significance of the observed overlap statistics (dots), and pvalues are shown on the right of each plot.
Visualization
Each heatmap to visualize XDE test results is organized in four whitebarseparated panels to show the expression patterns of XDE genes (rows) by cells (columns) ordered by pseudotime. The 1st and 2nd panels show original values and modelfitted values of gene expression. Cells from the samples in group 0 and 1 are separated. The 3rd and 4th panels show the standardized modelfitted group difference (trend difference) and the mean shift between groups, where white space denotes no significant difference.
Data availability
The data used in this manuscript are all downloaded from publicly available data sources. Specifically, HCABM data were downloaded from HCA data portal (immune cell atlas of human hematopoietic system)^{37,38} [https://data.humancellatlas.org/explore/projects/cc95ff892e684a08a234480eca21ce79]. COVID19 data were downloaded from the ArrayExpress database under accession code EMTAB9357 [https://www.ebi.ac.uk/arrayexpress/experiments/EMTAB9357]^{43}, and TB data were downloaded from the Gene Expression Omnibus (GEO) database under accession code GSE158769 [https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE158769]^{50}. All relevant information about data is described in the Methods section. All processed data generated in this study are provided in the Supplementary Information/Source Data file. Source data are provided with this paper and in Zenodo under accession code https://doi.org/10.5281/zenodo.8274409[https://zenodo.org/record/8274409]^{55}. Source data are provided with this paper.
Code availability
The Lamian package (v.0.99.1) is provided as an opensource software package with a detailed user manual available at https://github.com/Winnie09/Lamian. All codes to reproduce the presented analyses are publicly available in Github repository https://github.com/Winnie09/trajectory_variabilityand also in Zenodo under the accession code DOI: 10.5281/zenodo.8197779 [https://zenodo.org/record/8197779]^{56}. R version 4.0.2, topGO (v.2.42.0)^{57}, and ComplexHeatmap (v.2.6.2)^{58} were used to perform the analyses in the manuscript. The R package ggplot2 (v.3.3.0)^{59} for data visualization was used. All competing methods are described in Table S1. BioRender (BioRender.com) was used for generating part of Fig. 1a under a paid subscription, and the publication agreement number is FS25T0PP8E.
References
Trapnell, C. Defining cell types and states with singlecell genomics. Genome Res. 25, 1491–1498 (2015).
Cannoodt, R., Saelens, W. & Saeys, Y. Computational methods for trajectory inference from singlecell transcriptomics. Euro. J. Immunol. 46, 2496–2506 (2016).
Saelens, W., Cannoodt, R., Todorov, H. & Saeys, Y. A comparison of singlecell trajectory inference methods. Nat. Biotechnol. 37, 547–554 (2019).
Bendall, S. C. et al. Singlecell trajectory detection uncovers progression and regulatory coordination in human b cell development. Cell 157, 714–725 (2014).
Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating singlecell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).
Crinier, A. et al. Singlecell profiling reveals the trajectories of natural killer cell differentiation in bone marrow and a stress signature induced by acute myeloid leukemia. Cell. Mol. Immunol. 18, 1290–1304 (2021).
Chen, Z. et al. Tcf1centered transcriptional network drives an effector versus exhausted cd8 t cellfate decision. Immunity 51, 840–855 (2019).
Yao, C. et al. Singlecell rnaseq reveals tox as a key regulator of cd8+ t cell persistence in chronic infection. Nat. Immunol. 20, 890–901 (2019).
Herring, C. A. et al. Unsupervised trajectory analysis of singlecell rnaseq and imaging data reveals alternative tuft cell origins in the gut. Cell Syst. 6, 37–51 (2018).
van den Heuvel, A. et al. Singlecell rna sequencing in facioscapulohumeral muscular dystrophy disease etiology and development. Hum. Mol. Genet. 28, 1064–1075 (2019).
Paik, D. T., Cho, S., Tian, L., Chang, H. Y. & Wu, J. C. Singlecell rna sequencing in cardiovascular development, disease and medicine. Nat. Rev. Cardiol. 17, 457–473 (2020).
Mukherjee, S. et al. Molecular estimation of neurodegeneration pseudotime in older brains. Nat. Commun. 11, 5781 (2020).
Wilk, A. J. et al. A singlecell atlas of the peripheral immune response in patients with severe covid19. Nat. Med. 26, 1070–1076 (2020).
Wen, W. et al. Immune cell profiling of covid19 patients in the recovery stage by singlecell sequencing. Cell Discov. 6, 31 (2020).
Lee, J. S. et al. Immunophenotyping of covid19 and influenza highlights the role of type i interferons in development of severe covid19. Sci. Immunol. 5, eabd1554 (2020).
Arunachalam, P. S. et al. Systems biological assessment of immunity to mild versus severe covid19 infection in humans. Science 369, 1210–1220 (2020).
SchulteSchrepping, J. et al. Severe covid19 is marked by a dysregulated myeloid cell compartment. Cell 182, 1419–1440 (2020).
Silvin, A. et al. Elevated calprotectin and abnormal myeloid cell subsets discriminate severe from mild covid19. Cell 182, 1401–1418 (2020).
Zhu, L. et al. Singlecell sequencing of peripheral mononuclear cells reveals distinct immune response landscapes of covid19 and influenza patients. Immunity 53, 685–696 (2020).
Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 32, 381–386 (2014).
Qiu, X. et al. Reversed graph embedding resolves complex singlecell trajectories. Nat. Methods 14, 979982 (2017).
Cao, J. et al. The singlecell transcriptional landscape of mammalian organogenesis. Nature 566, 496–502 (2019).
Ji, Z. & Ji, H. Tscan: Pseudotime reconstruction and evaluation in singlecell rnaseq analysis. Nucleic Acids Res. 44, e117 (2016).
Street, K. et al. Slingshot: cell lineage and pseudotime inference for singlecell transcriptomics. BMC Genom. 19, 477 (2018).
Dann, E., Henderson, N. C., Teichmann, S. A., Morgan, M. D. & Marioni, J. C. Differential abundance testing on singlecell data using knearest neighbor graphs. Nat. Biotechnol. 40, 245–253 (2022).
Zhao, J. et al. Detection of differentially abundant cell subpopulations in scRNAseq data. Proc. Natl. Acad. Sci. USA 118, e2100293118 (2021).
Van den Berge, K. et al. Trajectorybased differential expression analysis for singlecell sequencing data. Nat. Commun. 11, 1201 (2020).
Campbell, K. R. & Yau, C. Uncovering pseudotemporal trajectories with covariates from single cell and bulk expression data. Nat. Commun. 9, 2442 (2018).
de Bézieux, H. R., Van den Berge, K., Street, K. & Dudoit, S. Trajectory inference across multiple conditions with condiments: differential topology, progression, differentiation, and expression. bioRxiv https://doi.org/10.1101/2021.03.09.433671 (2021).
Song, D. & Li, J. J. Pseudotimede: inference of differential gene expression along cell pseudotime with wellcalibrated pvalues from singlecell rna sequencing data. Genome Biol. 22, 124 (2021).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Royal Stat. Soc.: Series B (Methodolog.) 57, 289–300 (1995).
Stuart, T. et al. Comprehensive integration of singlecell data. Cell 177, 1888–1902 (2019).
Korsunsky, I. et al. Fast, sensitive and accurate integration of singlecell data with harmony. Nat. Methods 16, 1289–1296 (2019).
Lopez, R., Regier, J., Cole, M. B., Jordan, M. I. & Yosef, N. Deep generative modeling for singlecell transcriptomics. Nat. Methods 15, 1053–1058 (2018).
Luecken, M. D. et al. Benchmarking atlaslevel data integration in singlecell genomics. Nat. Methods 19, 41–50 (2022).
Tian, L. et al. Benchmarking single cell rnasequencing analysis pipelines using mixture control experiments. Nat. Methods 16, 479–487 (2019).
Regev, A. et al. Science forum: the human cell atlas. Elife 6, e27041 (2017).
Li, B. et al. Human cell atlas data portal: census of immune cells. https://data.humancellatlas.org/explore/projects/cc95ff892e684a08a234480eca21ce79. Accessed on 2019–08–30.
Erie, E. A. et al. Mice deficient in the als2 gene exhibit lymphopenia and abnormal hematopietic function. J. Neuroimmunol. 182, 226–231 (2007).
Deschepper, C. F. Regulatory effects of the uty/ddx3y locus on neighboring chromosome y genes and autosomal mrna transcripts in adult mouse nonreproductive cells. Sci. Rep. 10, 14900 (2020).
GalanCaridad, J. M. et al. Zfx controls the selfrenewal of embryonic and hematopoietic stem cells. Cell 129, 345–357 (2007).
Smyth, G. K. Limma: linear models for microarray data. In Bioinformatics and computational biology solutions using R and Bioconductor, 397–420 (Springer, New York, 2005).
Su, Y. et al. Multiomics resolves a sharp diseasestate shift between mild and moderate covid19. Cell 183, 1479–1495 (2020).
Szabo, S. J. et al. Distinct effects of tbet in th1 lineage commitment and ifnγ production in cd4 and cd8 t cells. Science 295, 338–342 (2002).
Intlekofer, A. M. et al. Effector and memory cd8+ t cell fate coupled by tbet and eomesodermin. Nat. Immunol. 6, 1236–1244 (2005).
Dominguez, C. X. et al. The transcription factors zeb2 and tbet cooperate to program cytotoxic t cell terminal differentiation in response to lcmv viral infection. J. Exp. Med. 212, 2041–2056 (2015).
Guan, T. et al. Zeb1, zeb2, and the mir200 family form a counterregulatory network to regulate cd8+ t cell fates. J. Exp. Med. 215, 1153–1168 (2018).
Rutishauser, R. L. et al. Transcriptional repressor blimp1 promotes cd8+ t cell terminal differentiation and represses the acquisition of central memory t cell properties. Immunity 31, 296–308 (2009).
Mathew, D. et al. Deep immune profiling of covid19 patients reveals distinct immunotypes with therapeutic implications. Science 369, eabc8511 (2020).
Nathan, A. et al. Multimodally profiling memory t cells from a tuberculosis cohort identifies cell state associations with demographics, environment and disease. Nat. Immunol. 22, 781–793 (2021).
Reshef, Y. A. et al. Covarying neighborhood analysis identifies cell populations associated with phenotypes of interest from singlecell transcriptomics. Nat. Biotechnol. 40, 355–363 (2022).
Hertz, D. & Schneider, B. Sex differences in tuberculosis. In Seminars in Immunopathology, vol. 41, 225–237 (Springer, 2019).
Begley, C. G. & Ioannidis, J. P. Reproducibility in science: improving the standard for basic and preclinical research. Circ. Res. 116, 116–126 (2015).
McInnes, L. et al. UMAP: uniform manifold approximation and projection. J. Open Source Softw. 3, 861 (2018).
Hou, W. et al. Lamian Paper Source Data, Zenodo https://doi.org/10.5281/zenodo.8274409 (2023).
Hou, W. et al. A statistical framework for differential pseudotime analysis with multiple singlecell rnaseq samples, Zenodo https://doi.org/10.5281/zenodo.8197779 (2023).
Alexa, A. & Rahnenführer, J. Gene set enrichment analysis with topgo. Bioconductor Improv 27, 1–26 (2009).
Gu, Z. Complex heatmap visualization. Imeta 1, e43 (2022).
Wickham, H. ggplot2: Elegant Graphics for Data Analysis. Use R! (Springer International Publishing, Switzerland, 2016).
Acknowledgements
This work is supported by the National Institutes of Health grants R01HG010889 and R01HG009518 to HJ, R00HG009007 to SCH, and K99HG011468 and R00HG011468 to WH. SCH is also supported by CZF2019002443 from the Chan Zuckerberg Initiative DAF, an advised fund of Silicon Valley Community Foundation. We would like to thank the Maryland Advanced Research Computing Center (MARCC) and Rockfish systems, and The Joint High Performance Computing Exchange (JHPCE) for providing computing resources.
Author information
Authors and Affiliations
Contributions
W.H., Z.J., S.C.H. and H.J. conceived the study. H.J., S.C.H. and W.H. conceptualized the Lamian framework. W.H. and H.J. developed the statistical model and algorithm with feedback from Z.J. and S.C.H. W.H. implemented the model and software. Z.J., W.H., and Z.C. prepared the data. W.H. and Z.J. analyzed the data. W.H., Z.C., E.J.W., S.C.H. and H.J. interpreted the results. W.H., S.C.H. and H.J. drafted the manuscript. All authors edited and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
E.J.W. is a member of the Parker Institute for Cancer Immunotherapy that supports research in the Wherry lab. E.J.W. is an advisor for Danger Bio, Marengo, Janssen, NewLimit, Pluto Immunotherapeutics Related Sciences, Santa Ana Bio, and Synthekine. E.J.W. is a founder of and holds stock in Arsenal Biosciences and holds stock in Coherus. Other authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks KimAnh Lê Cao, Wei Chen and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. This article has been peer reviewed as part of Springer Nature’s Guided Open Access initiative.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Hou, W., Ji, Z., Chen, Z. et al. A statistical framework for differential pseudotime analysis with multiple singlecell RNAseq samples. Nat Commun 14, 7286 (2023). https://doi.org/10.1038/s4146702342841y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s4146702342841y
This article is cited by

Leveraging neighborhood representations of singlecell data to achieve sensitive DE testing with miloDE
Genome Biology (2024)

DELVE: feature selection for preserving biological trajectories in singlecell data
Nature Communications (2024)

Trajectory inference across multiple conditions with condiments
Nature Communications (2024)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.