## Introduction

Single-cell simulation engines are becoming increasingly important for testing and benchmarking computational methods, a pressing need in the widely expanding field of single-cell biology. Complementary to real biological data, synthetic data provides a valuable alternative where the actual ground truth is completely known and thus can be compared to, in order to make quantitative evaluations of computational methods that aim to reconstruct this ground truth1. In addition, simulation engines are more flexible when it comes to stress-testing computational methods, for example by varying the parameters of the simulation, such as the amount of noise, samples, and cells measured, allowing benchmarking of methods over a wide range of possible scenarios. In this way, they can even guide the design of real biological experiments, finding out the best conditions to be used as input for subsequent computational pipelines.

Another, more experimental use of simulation engines is their important role in spearheading the development of computational methods, possibly even before real data is available. In this way, simulation engines can be used to assess the value of novel experimental protocols or treatments. Simulation engines are also increasingly important when it comes to finding alternatives to animal models, for example for drug testing and precision medicine. In such scenarios, cellular simulations can act as digital twins, offering unlimited experimentation in silico2.

Simulating realistic data requires that the underlying biology is recapitulated as best as possible, and in the case of transcriptomics data this typically involves modelling the underlying gene regulatory networks. Simulators of “bulk” microarray or RNA-sequencing profiles simulate biological processes (e.g. transcription, translation) by translating a database of known regulatory interactions into a set of ordinary differential equations (ODE)3,4,5,6. These methods have been instrumental in performing benchmarking studies7,8,9. However, the advent of single-cell omics introduced several new types of analyses (e.g. trajectory inference, RNA velocity, cell-specific network inference) which exploit the higher resolution of single-cell versus bulk omics10. In addition, the data characteristics of single-cell omics are vastly different from bulk omics, typically having much lower library sizes and a higher dropout rate, but also a high number of profiles11. The low library sizes, in particular, are problematic as ODEs are ill-suited for performing low-molecule simulations12. This necessitates the development of single-cell simulators.

To this end, single-cell omics simulators emulate the technical procedures from single-cell omics protocols. Simulators such as Splatter1, powsimR13, PROSSTT14, and SymSim15) have already been widely used to compare single-cell methods16,17,18,19 and perform independent benchmarks20,21,22. However, by focusing more on simulating the single-cell omics protocol (e.g. RNA capture, amplification, sequencing) and less on the underlying biology (e.g. transcription, splicing, translation), their applicability and reusability is limited towards the specific application for which they were designed (e.g. benchmarking clustering or differential expression methods), and extending these tools to include additional modalities or experimental conditions is challenging.

We introduce dyngen, a method for simulating cellular dynamics at a single-cell, single-transcript resolution (Fig. 1). This problem is tackled in three fully configurable main steps. First, biological processes are mimicked by translating a gene regulatory network into a set of reactions (regulation, transcription, splicing, translation). Second, individual cells are simulated using Gillespie’s stochastic simulation algorithm (SSA)12, which is designed to work well in low-molecule simulations. Finally, real reference datasets are used to emulate single-cell omics profiling protocols.

Throughout a simulation, dyngen tracks many layers of information, including the abundance of any molecule in the cell, the progression of the cell along a dynamic process, and the activation strength of individual regulatory interactions. In addition, dyngen can simulate a large variety of dynamic processes (e.g. cyclic, branching, disconnected) as well as a broad range of experimental conditions (e.g. batch effects and time-series, perturbation and single-cell knockdown experiments). For these reasons, dyngen can cater to a wide range of benchmarking applications, including trajectory inference, trajectory alignment, and trajectory differential expression (Supplementary Table 1).

## Results

We demonstrate dyngen’s broad applicability by evaluating three types of computational approaches for which no simulation engines exist yet: cell-specific network inference, trajectory alignment and RNA velocity (Fig. 2). We emphasise that our main aim here is to illustrate the potential of dyngen for these evaluations, rather than performing large-scale benchmarking, which would require assessing many more quantitative and qualitative aspects of each method23.

Use-case “trajectory alignment”. Trajectory alignment methods align trajectories from different samples and allow studying the differences between the different trajectories. For example, by comparing the transcriptomic profiles of cells from a diseased patient to a healthy control, it might be possible to detect transcriptomics differences (differential expression) of particular cells along a developmental process, or to detect an early stop of the trajectory of the diseased patient. Currently, trajectory alignment is limited to aligning linear trajectories, though other topologies of a trajectory could be aligned as well. Dynamic Time Warping (DTW)24 is a method designed for aligning temporal sequences for speech recognition but has since been used to compare gene expression kinetics from many different biological processes25,26,27,28. cellAlign28 uses DTW to perform trajectory alignment, but also includes interpolation and scaling of the single-cell data as a preprocessing step. We evaluate the performance of DTW and cellAlign by simulating 40 datasets, each containing two linear trajectories generated with the same gene regulatory network but with slightly different simulation kinetics. We assess the accuracy of the obtained alignments by comparing the generated alignment path with the worst possible alignment that could be performed (Supplementary Fig. 1D), named the Area Between Worst And Prediction (ABWAP) score. Overall, cellAlign performs significantly better than DTW (Supplementary Fig. 1), which is likely due to the interpolation and scaling steps provided by cellAlign, reducing noise in the data and improving the comparability of the trajectories. Note that, in this comparison, only linear trajectory alignment is performed. While dyngen can generate non-linear trajectories (e.g. cyclic or branching), both aligning non-linear trajectories and constructing a quantitative accuracy metric for non-linear trajectory alignment is not trivial and an avenue for future work.

Use-case “RNA velocity”. RNA velocity methods use the relative ratio between pre-mRNA and mature mRNA reads to predict the rate of increase/decrease of RNA molecule abundance, as this can be used to predict the directionality of single-cell differentiation in trajectories29,30. Already two algorithms are currently available for estimating the RNA velocity vector from spliced and unspliced counts: velocyto30 and scvelo31. Yet, to date, no quantitative assessment of their accuracy has been performed, mainly due to the difficulty in obtaining real ground-truth data to do so. In contrast, the ground-truth RNA velocity can be easily extracted from a dyngen simulation, as it is possible to store the rate at which mRNA molecules are being transcribed and degraded at any particular point in time. We executed velocyto and scvelo (with 2 different parameter settings, stochastic and dynamical) on 42 datasets with a variety of backbones (including linear, bifurcating, cyclic, disconnected). We evaluated the predictions using two metrics (Supplementary Fig. 2), one which directly compares the predicted RNA velocity of each gene with the ground-truth RNA velocity (called the “velocity correlation”), and one which compares the direction of the ground-truth trajectory embedded in a dimensionality reduction with the average RNA velocity of cells in that neighbourhood (called the “velocity arrow cosine”). While both velocyto and scvelo obtained high scores for the velocity arrow cosine metric (overall 25th percentile = 0.606), the velocity correlation is rather low (overall 75th percentile = 0.156). This means that predicting the RNA velocity (i.e. transcription rate minus the decay rate) for particular individual genes can be challenging, but the combined information is very informative in determining the directionality of cell progression in the trajectory. In terms of velocity correlation, no method performed significantly better than the other, whereas “scvelo stochastic” performed slightly worse than “scvelo dynamical” and velocyto in terms of velocity arrow cosine score. Note that, given that some genes are more informative in determining the overall directionality of cell progression, performing a feature selection before computing the embedded dimensionality reduction might result in significantly improved velocity arrow cosine scores.

Use-case “Cell-specific network inference” (CSNI). CSNI methods predict not only which transcription factors regulate which target genes, but also aim to identify how active each interaction is in each of the cells, since interactions can be turned off and on depending on the cellular state. While a few pioneering CSNI approaches have already been developed32,33,34, a quantitative assessment of their performance is until now lacking. This is not surprising, as neither real nor in silico datasets of cell-specific or even cell-type-specific interactions exist that are large enough so that it can be used as a ground-truth for evaluating CSNI methods. Extracting the ground-truth dynamic network in dyngen is straightforward though, given that we can calculate how target gene expression would change without the regulator being present. We used this ground-truth to compare the performance of three CSNI methods (Supplementary Fig. 3): LIONESS33, SSN34, and SCENIC32. For each dataset, we computed the mean Area Under the Receiver Operating Characteristic-curve (AUROC) and Area Under the Precision-Recall curve (AUPR) scores of the individual cells. Comparing the mean AUROC and AUPR showed that pySCENIC significantly outperforms both LIONESS and SSN, and in turn that LIONESS significantly outperforms SSN. The poor performance of SSN is expected, as its methodology for predicting a cell-specific is simply computing the difference in Pearson correlation values applied to the whole dataset and the whole dataset minus one sample. This strategy performs poorly in large datasets where cell correlations are high, as the removal of one cell will not yield large differences in correlation values and will result in mostly noise. Overall, pySCENIC almost always performs better than LIONESS, except for a few datasets where LIONESS does manage to obtain a higher AUROC score. However, by using a different internal network inference (e.g. GENIE335 or pySCENIC’s GRNBoost236) could significantly increase the performance obtained by LIONESS.

## Discussion

dyngen’s single-cell simulations can be used to evaluate common single-cell omics computational methods such as clustering, batch correction, trajectory inference, and network inference. However, the framework is flexible enough to be adaptable to a broad range of applications, including methods that integrate clustering, network inference, and trajectory inference. In this respect, dyngen may promote the development of tools in the single-cell field similarly as other simulators have done in the past5,37. Additionally, one could anticipate technological developments in single-cell multi-omics. In this way, dyngen allows designing and evaluating the performance and robustness of new types of computational analyses before experimental data becomes available, comparing which experimental protocol is the most cost-effective in producing qualitative and robust results in downstream analysis. One major assumption of dyngen is that cells are assumed to be well-mixed and independent from each other. Subdividing a cell into multiple 2D or 3D subvolumes or allowing cells to exchange molecules, respectively, could pave the way to better study key cellular processes such as cell division, intercellular communication, and migration38.

## Methods

The workflow to generate in silico single-cell data consists of six main steps (Supplementary Fig. 4).

### Defining the module network

One of the main processes involved in cellular dynamic processes is gene regulation, where regulatory cascades and feedback loops lead to progressive changes in expression and decision making. The exact way a cell chooses a certain path during its differentiation is still an active research field, although certain models have already emerged and been tested in vivo. One driver of bifurcation is mutual antagonism, where two genes strongly repress each other39,40, forcing one of the two to become inactive41. Such mutual antagonism can be modelled and simulated42,43. Although the two-gene model is simple and elegant, the reality is frequently more complex, with multiple genes (grouped into modules) repressing each other44.

To start a dyngen simulation, the user needs to define a module network. The module network describes how sets of genes regulate each other and is what mainly determines which dynamic processes occur within the simulated cells.

A module network consists of modules connected together by regulatory interactions, which can be either upregulating or downregulating. A module may have basal expression, which means genes in this module will be transcribed without the presence of transcription factor molecules. A module marked as “active during the burn phase” means that this module will be allowed to generate expression of its genes during an initial warm-up phase. At the end of the dyngen process, cells will not be sampled from the burn phase simulations. Interactions between modules have a strength (which is a positive integer) and an effect (+1 for upregulating, −1 for downregulating).

Several examples of module networks are given in Supplementary Fig. 5. A simple chain of modules (where one module upregulates the next) results in a linear process. By having the last module repress the first module, the process becomes cyclic. Two modules repressing each other is the basis of a bifurcating process, though several chains of modules have to be attached in order to achieve progression before and after the bifurcation process. Finally, a converging process has a bifurcation occurring during the burn phase, after which any differences in module regulation is removed.

Note that these examples represent the bare minimum in terms of the number of modules used. Using longer chains of modules is typically desired. In addition, the fate decisions made in this example of a bifurcation is reversible, meaning cells can be reprogrammed to go down a different differentiation path. If this effect is undesirable, more safeguards need to be put in place to prevent reprogramming from occurring.

### Generating the gene regulatory network

The GRN is generated based on the given module network in four main steps (Supplementary Fig. 6).

Step 1, sampling the transcription factors (TF). The TFs are the main drivers of the molecular changes in the simulation. The user provides a backbone and the number of TFs to generate. Each TF is assigned to a module such that each module has at least x parameters (default x = 1). A TF inherits the ‘burn’ and ‘basal expression’ from the module it belongs to.

Step 2, generating the TF interactions. Let each TF be regulated according to the interactions in the backbone. These interactions inherit the effect, strength, and independence parameters from the interactions in the backbone. A TF can only be regulated by other TFs or itself.

Step 3, sampling the target subnetwork. A user-defined number of target genes are added to the GRN. Target genes are regulated by a TF or another target gene, but are always downstream of at least one TF. To sample the interactions between target genes, one of the many FANTOM545 GRNs is sampled. The currently existing TFs are mapped to regulators in the FANTOM5 GRN. The targets are drawn from the FANTOM5 GRN weighted by their page rank value, to create an induced GRN. For each target, at most x regulators are sampled from the induced FANTOM5 GRN (default x = 5). The interactions connecting a target gene and its regulators are added to the GRN.

Step 4, sampling the housekeeping subnetwork. Housekeeping genes are completely separate from any TFs or target genes. A user-defined set of housekeeping genes is also sampled from the FANTOM5 GRN. The interactions of the FANTOM5 GRN are first subsampled such that the maximum in-degree of each gene is x (default x = 5). A random gene is sampled and a breadth-first-search is performed to sample the desired number of housekeeping genes.

### Convert gene regulatory network to a set of reactions

Simulating a cell’s GRN makes use of a stochastic framework which tracks the abundance levels of molecules over time in a discrete quantity. For every gene G, the abundance levels of three molecules are tracked, namely of corresponding pre-mRNAs, mature mRNAs and proteins, which are represented by the terms xG, yG, and zG respectively. The GRN defines how a reaction affects the abundance levels of molecules and how likely it will occur. Gibson and Bruck46 provide a good introduction to modelling gene regulation with stochastic frameworks, on which many of the concepts below are based.

For every gene in the GRN a set of reactions are defined, namely transcription, splicing, translation, and degradation. Each reaction consists of a propensity function—a formula f(.) to calculate the probability f(.) × dt of it occurring during a time interval dt—and the effect—how it will affect the current state if triggered.

The effects of each reaction mimic the respective biological processes (Supplementary Table 2, middle). Transcription of gene G results in the creation of a single pre-mRNA molecule xG. Splicing turns one pre-mRNA xG into a mature mRNA xG. Translation uses a mature mRNA yG to produce a protein zG. Pre-mRNA, mRNA, and protein degradation results in the removal of a xG, yG, and zG molecule, respectively.

The propensity of all reactions except transcription are all linear functions (Supplementary Table 2, right) of the abundance level of some molecule multiplied by a per-gene constant (Supplementary Table 3). The propensity of transcription of a gene G depends on the abundance levels of its TFs. The per-gene and per-interaction constants are based on the median reported production-rates and half-lives of molecules measured of 5000 mammalian genes47, except that the transcription rate has been amplified by a factor of 10.

The propensity of the transcription of a gene G is inspired by thermodynamic models of gene regulation48, in which the promoter of G can be bound or unbound by a set of N transcription factors Hi. Let f(z1, z2, …, zN) denote the propensity function of G, in function of the abundance levels of the transcription factors. The following subsections explain and define the propensity function when N = 1, N = 2, and finally for an arbitrary N.

#### Propensity of transcription when N = 1

In the simplest case when N = 1, the promoter can be in one of two states. In state S0, the promoter is not bound by any transcription factors, and in state S1 the promoter is bound by H1. Each state Sj is linked with a relative activation αj, a number between 0 and 1 representing the activity of the promoter at this particular state. The propensity function is thus equal to the expected value of the activity of the promoter multiplied by the pre-mRNA production rate of G.

$$f({y}_{1},{y}_{2},\ldots ,{y}_{N})=\,\text{xpr}\,\cdot \mathop{\sum }\limits_{j=0}^{{2}^{N}-1}{\alpha }_{j}\cdot P({S}_{j})$$
(1)

For N = 1, P(S1) is equal to the Hill equation, where ki represents the concentration of Hi at half-occupation and ni represents the Hill coefficient. Typically, ni is between [1, 10]

$$P({S}_{1})=\frac{{y}_{1}^{{n}_{1}}}{{k}_{1}^{{n}_{1}}+{y}_{1}^{{n}_{1}}}$$
(2)
$$=\frac{{({y}_{1}/{k}_{1})}^{{n}_{1}}}{1+{({y}_{1}/{k}_{1})}^{{n}_{1}}}$$
(3)

The Hill equation can be simplified by letting $${\nu }_{i}={\left(\frac{{y}_{i}}{{k}_{i}}\right)}^{{n}_{i}}$$.

$$P({S}_{1})=\frac{{\nu }_{1}}{1+{\nu }_{1}}$$
(4)

Since P(S0) = 1 − P(S1), the activation function is formulated and simplified as follows.

$$f({y}_{1})=\,\text{xpr}\,\cdot \left({\alpha }_{0}\cdot P({S}_{0})+{\alpha }_{1}\cdot P({S}_{1})\right)$$
(5)
$$=\,\text{xpr}\,\cdot \left({\alpha }_{0}\cdot \frac{1}{1+{\nu }_{1}}+{\alpha }_{1}\cdot \frac{{\nu }_{1}}{1+{\nu }_{1}}\right)$$
(6)
$$=\,\text{xpr}\,\cdot \frac{{\alpha }_{0}+{\alpha }_{1}\cdot {\nu }_{1}}{1+{\nu }_{1}}$$
(7)

#### Propensity of transcription when N = 2

When N = 2, there are four states Sj. The relative activations αj can be defined such that H1 and H2 are independent (additive) or synergistic (multiplicative). In order to define the propensity of transcription f(.), the Hill equation P(Sj) is extended for two transcription factors.

Let wj be the numerator of P(Sj), defined as the product of all transcription factors bound in that state:

$${w}_{0}=1$$
(8)
$${w}_{1}={\nu }_{1}$$
(9)
$${w}_{2}={\nu }_{2}$$
(10)
$${w}_{3}={\nu }_{1}\cdot {\nu }_{2}$$
(11)

The denominator of P(Sj) is then equal to the sum of all wj. The probability of state Sj is thus defined as:

$$P({S}_{j})=\frac{{w}_{j}}{\mathop{\sum }\nolimits_{j = 0}^{j\,{<}\,{2}^{N}}{w}_{j}}$$
(12)
$$=\frac{{w}_{j}}{1+{\nu }_{1}+{\nu }_{2}+{\nu }_{1}\cdot {\nu }_{2}}$$
(13)
$$=\frac{{w}_{j}}{\mathop{\prod }\nolimits_{i = 1}^{i\le N}({\nu }_{i}+1)}$$
(14)

Substituting P(Sj) and wj into f(.) results in the following equation:

$$f({y}_{1},{y}_{2})=\,\text{xpr}\,\cdot \mathop{\sum }\limits_{j=0}^{{2}^{N}-1}{\alpha }_{j}\cdot P({S}_{j})$$
(15)
$$=\,\text{xpr}\,\cdot \frac{\mathop{\sum }\nolimits_{j = 0}^{{2}^{N}-1}{\alpha }_{j}\cdot {w}_{j}}{\mathop{\prod }\nolimits_{i = 1}^{i\le N}({\nu }_{i}+1)}$$
(16)
$$=\,\text{xpr}\,\cdot \frac{{\alpha }_{0}+{\alpha }_{1}\cdot {\nu }_{1}+{\alpha }_{2}\cdot {\nu }_{2}+{\alpha }_{3}\cdot {\nu }_{1}\cdot {\nu }_{2}}{({\nu }_{1}+1)\cdot ({\nu }_{2}+1)}$$
(17)

#### Propensity of transcription for an arbitrary N

For an arbitrary N, there are 2N states Sj. The relative activations αj can be defined such that H1 and H2 are independent (additive) or synergistic (multiplicative). In order to define the propensity of transcription f(.), the Hill equation P(Sj) is extended for N transcription factors.

Let wj be the numerator of P(Sj), defined as the product of all transcription factors bound in that state:

$${w}_{j}=\mathop{\prod }\limits_{i=1}^{i\le N}(j\,\text{mod}\,i)=1\,\text{?}{\nu }_{i}\text{:}\,1$$
(18)

The denominator of P(Sj) is then equal to the sum of all wj. The probability of state Sj is thus defined as:

$$P({S}_{j})=\frac{{w}_{j}}{\mathop{\sum }\nolimits_{j = 0}^{j\,{<}\,{2}^{N}}{w}_{j}}$$
(19)
$$=\frac{{w}_{j}}{\mathop{\prod }\nolimits_{i = 1}^{i\le N}({\nu }_{i}+1)}$$
(20)

Substituting P(Sj) into f(.) yields:

$$f({y}_{1},{y}_{2},\ldots ,{y}_{N})=\,\text{xpr}\,\cdot \mathop{\sum }\limits_{j=0}^{{2}^{N}-1}{\alpha }_{j}\cdot P({S}_{j})$$
(21)
$$=\,\text{xpr}\,\cdot \frac{\mathop{\sum }\nolimits_{j = 0}^{{2}^{N}-1}{\alpha }_{j}\cdot {w}_{j}}{\mathop{\prod }\nolimits_{i = 1}^{i\le N}({\nu }_{i}+1)}$$
(22)

#### Propensity of transcription for a large N

For large values of N, computing f(.) is practically infeasible as it requires performing 2N summations. In order to greatly simplify f(.), αj could be defined as 0 when one of the regulators inhibits transcription and 1 otherwise.

$${\alpha }_{j}=\left\{\begin{array}{l}0\ \,\text{if}\,\ \exists \ i:j\ \,\text{mod}\,\ i=1\ \,\text{and}\,\ {H}_{i}\ \,\text{represses}\,\ G\\ 1\ \,\text{otherwise}\,\end{array}\right.$$
(23)

Substituting Eq. (23) into Eq. (22) and defining R = {1, 2, …, N} and R+ = {iHiactivatesG} yields the simplified propensity function:

$$f({y}_{1},{y}_{2},\ldots ,{y}_{N})=\,\text{xpr}\,\cdot \frac{{\prod }_{i\in {R}^{+}}({\nu }_{i}+1)}{{\prod }_{i\in R}({\nu }_{i}+1)}$$
(24)

#### Independence, synergism, and basal expression

The definition of αj as in Eq. (23) presents two main limitations. Firstly, since α0 = 1, it is impossible to tweak the propensity of transcription when no transcription factors are bound. Secondly, it is not possible to tweak the independence and synergism of multiple regulators.

Let ba [0, 1] denote the basal expression strength G (i.e. how much will G be expressed when no transcription factors are bound), and sy [0, 1] denote the synergism of regulators Hi of G, the transcription propensity becomes:

$$f({y}_{1},{y}_{2},\ldots ,{y}_{N})=\,\text{xpr}\cdot \frac{\text{ba}-{\text{sy}}^{| {R}^{+}| }+{\prod }_{i\in {R}^{+}}({\nu }_{i}+\,\text{sy}\,)}{{\prod }_{i\in R}({\nu }_{i}+1)}$$
(25)

### Simulate single cells

dyngen uses Gillespie’s SSA12 to simulate dynamic processes. An SSA simulation is an iterative process where at each iteration one reaction is triggered.

Each reaction consists of its propensity—a formula to calculate the probability of the reaction occurring during an infinitesimal time interval—and the effect—how it will affect the current state if triggered. Each time a reaction is triggered, the simulation time is incremented by $$\tau =\frac{1}{{\sum }_{j}{{\rm{prop}}}_{j}}{\mathrm{ln}}\,\left(\frac{1}{r}\right)$$, with r U (0, 1) and propj the propensity value of the jth reaction for the current state of the simulation.

GillespieSSA2 is an optimised library for performing SSA simulations. The propensity functions are compiled to C++ and SSA approximations can be used which allow triggering many reactions simultaneously at each iteration. The framework also allows storing the abundance levels of molecules only after a specific interval has passed since the previous census. By setting the census interval to 0, the whole simulation’s trajectory is retained but many of these time points will contain very similar information. In addition to the abundance levels, also the propensity values and the number of firings of each of the reactions at each of the time steps can be retained.

### Simulate experiment

From the SSA simulation we obtain the abundance levels of all the molecules at every state. We need to replicate technical effects introduced by experimental protocols in order to obtain data that is similar to real data. For this, the cells are sampled from the simulations and molecules are sampled for each of the cells. Gene capture rates and library sizes are empirically derived from real datasets to match real technical variation.

#### Sample cells

In this step, N cells are sampled from the simulations. Two approaches are implemented: sampling from an unsynchronised population of single cells (snapshot) or sampling at multiple time points in a synchronised population (time series).

Snapshot. The backbone consists of several states linked together by transition edges with length Li, to which the different states in the different simulations have been mapped (Supplementary Fig. 7A). From each transition, $${N}_{i}=N/\frac{{L}_{i}}{\sum {L}_{i}}$$ cells are sampled uniformly, rounded such that ∑Ni = N.

Time series. Assuming that the final time of the simulation is T, the interval [0, T] is divided into k equal intervals of width w separated by k − 1 gaps of width g. Ni = N/k cells are sampled uniformly from each interval (Supplementary Fig. 7B), rounded such that ∑Ni = N. By default, k = 8 and g = 0.75. For usual dyngen simulations, 10 ≤ T ≤ 20. For larger values of T, k and g should be increased accordingly.

#### Sample molecules

Molecules are sampled from the simulation to replicate how molecules are experimentally sampled. A real dataset is downloaded from a repository of single-cell RNA-seq datasets49. For each in silico cell i, draw its library size lsi from the distribution of transcript counts per cell in the real dataset. The capture rate crj of each in silico molecule type j is drawn from N(1, 0.05). Finally, for each cell i, draw lsi molecules from the multinomial distribution with probabilities crj × abi,j with abi,j the molecule abundance level of molecule j in cell i.

#### Comparison between a dyngen and a reference dataset

Comparison between a dyngen dataset and the reference dataset it used in terms of characteristic single-cell omics features showed that dyngen produces datasets with highly similar data characteristics (Supplementary Note 1). Supplementary Note 1 was generated using countsimQC50.

### Simulating batch effects

Simulating batch effects can be performed in multiple ways. One such way is to perform the first two steps of the creation of a dyngen model (defining the module network and generating the GRN). For each desired batch, create a separate model for which random kinetics are generated and perform all subsequent dyngen steps (convert to reactions, simulate gold standard, simulate single cells, simulate experiment). Since each separate model has different underlying kinetics, the combined output will resemble having batch effects.

### Determining the ground-truth trajectory

To construct the ground-truth trajectory, the user needs to provide the ground-truth state network alongside the initial module network (Supplementary Fig. 8). Each edge in the state network specifies which modules are allowed to change in expression in transitioning from one state to another. For each edge, a simulation is run using the end state of an upstream branch as the initial expression vector, and only allowing the modules as predefined by the attribute to change.

As an example, consider the cyclic trajectory shown in Supplementary Fig. 8. State S0 begins with an expression vector of all zero values. To simulate the transition from S0 to S1, regulation of the genes in modules A, B, and C are turned on. After a predefined period of time, the end state of this transition is considered the expression vector of state S1. To simulate the transition from S1 to S2, regulation of the genes in modules D and E are turned on, while the regulation of genes in module C is turned off. During this simulation, the expression of genes in modules A, B, D, and E is thus allowed to change. The end state of the simulation is considered the expression vector of state S2.

For each of the branches in the state network, an expression matrix and the corresponding progression time along that branch are retained. To map a simulated cell to the ground-truth, the correlation between its expression values and the expression matrix of the ground-truth trajectory is calculated, and the cell is mapped to the position in the ground-truth trajectory that has the highest correlation.

### Determining the cell-specific ground-truth regulatory network

Calculating the regulatory effect of a regulator R on a target T (Supplementary Fig. 4F) requires determining the contribution of R in the propensity function of the transcription of T with respect to other regulators. This information is useful, amongst others, for benchmarking cell-specific network inference methods.

The regulatory effect of R on T at a particular state S is defined as the change in the propensity of transcription when R is set to zero, scaled by the inverse of the pre-mRNA production rate of T. More formally:

$${\text{regeffect}}_{G}=\frac{{\text{proptrans}}_{G}(S)-{\text{proptrans}}_{G}(S[{\text{z}}_{T}\leftarrow 0])}{{\text{xpr}}_{G}}$$
(26)

Determining the regulatory effect for all interactions and cells in the dataset yields the complete cell-specific ground-truth GRN. The regulatory effect lies between [−1, 1], where −1 represents complete inhibition of T by R, 1 represents maximal activation of T by R, and 0 represents inactivity of the regulatory interaction between R and T.

### Comparison of cell-specific network inference methods

42 datasets were generated using the 14 different predefined backbones and three different seeds. For every cell in the dataset, the transcriptomics profile and the corresponding cell-specific ground-truth regulatory network was determined.

We selected three cell-specific NI methods: SCENIC32, LIONESS33,51, and SSN34.

LIONESS33 runs a NI method multiple times to construct cell-specific GRNs. LIONESS first infers a GRN with all of the samples. A second GRN is inferred with all samples except one particular profile. The cell-specific GRN for that particular profile is defined as the difference between the two GRN matrices. This process is repeated for all profiles, resulting in a cell-specific GRN. By default, LIONESS uses PANDA52 to infer GRNs, but since dyngen does not produce motif data and motif data is required by PANDA, PANDA is inapplicable in this context. Instead, we used the lionessR53 implementation of LIONESS, which uses by default the Pearson correlation as a NI method. We marked results from this implementation as “LIONESS + Pearson”.

SSN34 follows, in essence, the exact same methodology as LIONESS except that it specifically only uses the Pearson correlation. It is worth noting that the LIONESS preprint was released before the publication of SSN. Since no implementation was provided by the authors, we implemented SSN in R using basic R and tidyverse functions54 and marked results from this implementation as “SSN*”.

SCENIC32 consists of four main steps. First, classical network inference is performed with stochastic gradient boosting machines using arboreto36. Second, the top 10 regulators of every target gene are selected. Interactions are grouped together in ‘modules’; each module contains one regulator and all of its targets. Next, the modules are filtered using motif analysis. Finally, for each module and each cell, an activity score is calculated using AUCell. As a post-processing of this output, all modules and the corresponding activity scores are combined back into a cell-specific GRN consisting of (cell, regulator, target, score) pairs. For this analysis, the Python implementation of SCENIC was used, namely pySCENIC55. Since dyngen does not generate motif data, step 3 in this analysis is skipped.

The AUROC and AUPR metrics are common metrics for evaluating a predicted GRN with a ground-truth GRN56. To compare a predicted cell-specific GRN with the ground-truth cell-specific GRN, the top 10,000 interactions per cell is retained, and the mean AUROC and AUPR scores are calculated across all cells.

We compared the mean AUROC and AUPR scores obtained by the three CSNI methods across all datasets by performing pairwise non-parametric paired two-sided Durbin–Conover tests57 using pairwiseComparisons58. Test statistics and p values for the all pairwise combinations are reported in the Source Data file. Reported p values are adjusted for multiple testing using Holm correction59.

### Comparison of RNA velocity methods

Three datasets were generated for each of the 14 different predefined backbones, resulting in a collection of 42 datasets. Throughout each of the simulation, the propensity of the transcription and mRNA decay is collected, as the RNA velocity of a gene at any point in the simulation is the difference between the transcription propensity and the mRNA decay propensity.

We applied two RNA velocity methods: velocyto30, as implemented in the velocyto.py package, and scvelo method31, as implemented in the scvelo package. For scvelo, we chose two parameter settings for “mode”, namely “stochastic” and “dynamical”. For both methods, we used the same normalised data as provided by dyngen, with no extra cell or feature filtering, but otherwise matched the parameters to their respective tutorial vignettes as well as possible.

We compared each RNA velocity prediction to the ground-truth using two metrics: the velocity correlation and the velocity arrow cosine. For the velocity correlation, we extracted a ground truth RNA velocity by subtracting for each mRNA molecule the propensity of its production by the propensity of its degradation. If the expression of an mRNA will increase in the future, this value is positive, while it is negative if it is going to decrease. For each gene, we determined its velocity correlation by calculating the Spearman rank correlation between the ground truth velocity with the observed velocity. For the velocity arrow cosine, we determined a set of 100 trajectory waypoints uniformly spread on the trajectory. For each waypoint, we weighted each cell based on a Gaussian kernel on its geodesic distance from the waypoint. These weights were used to calculate a weighted average velocity vector of each waypoint. We then calculated for each waypoint the cosine similarity between this velocity vector and the known direction of the trajectory.

We compared the velocity correlation and velocity arrow cosine scores obtained by velocyto and scvelo across all datasets by performing pairwise non-parametric paired two-sided Durbin–Conover tests57 using pairwiseComparisons58. Test statistics and p values for the all pairwise combinations are reported in the Source Data file. Reported p values are adjusted for multiple testing using Holm correction59.

### Comparison of trajectory alignment

Four custom linear backbones of varying sizes were constructed. For each of these backbones, 10 datasets were generated with 10 different seeds, resulting in a total of 40 datasets. Every dataset is generated in three main steps. First, the GRN is generated based on the given backbone. Next, generating the kinetics, gold standard, and cells is performed twice, resulting in two sub-datasets. Finally, the two sub-datasets are combined and cells are sampled from the combined dataset. Since the two sub-datasets were simulated with different kinetic parameters, the combined dataset will contain two trajectories.

On each combined dataset we applied two trajectory alignment methods, DTW24 and cellAlign28. DTW is designed to align temporal sequences by dilating or contracting the sequences to best match each other. cellAlign uses DTW to perform this alignment, but first interpolates and rescales the input data in order to better cope with single-cell omics data.

To evaluate a trajectory alignment method on a combined dataset we computed the geodesic distances of each cell from the start of the trajectory, also called the pseudotime. For each dataset, the pseudotime values are rescaled between 0 and 1 to allow for easier comparison. A trajectory alignment produces a sequence of index pairs [(i0, j0), (i1, j1), …, (iN, jN)], where i0 and j0 are equal to 0 (the first position in both pseudotime series), iN and jN are equal to the respective last positions in the pair of pseudotime series, and [i0, i1, …, iN] and [j0, j1, …, jN] are in ascending order and can contain duplicates values. The ABWAP metric is defined as follows, where pt1 and pt2 are the unit pseudotime vectors. See Supplementary Fig. 1D for a visual interpretation of this metric.

$${\rm{ABWAP}}=1-{\rm{area}}\_{\rm{under}}\_{\rm{curve}}(p{t}_{1}[{i}_{0}..{i}_{N}]+p{t}_{2}[{j}_{0}..{j}_{N}],abs(p{t}_{1}[{i}_{0}..{i}_{N}]-p{t}_{2}[{j}_{0}..{j}_{N}]))$$
(27)

We compared the ABWAP scores obtained by DTW and cellAlign across all datasets by performing pairwise non-parametric paired two-sided Durbin–Conover tests57 using pairwiseComparisons58. Test statistics and p values for the all pairwise combinations are reported in the Source Data file. Reported p values are adjusted for multiple testing using Holm correction59.

### Comparison of scalability and runtime

Simulating a bifurcating cycle dataset with 10,000 genes and 10,000 required in total 1147 s (Supplementary Note 2). Fixing the number of genes and varying the number of cells showed that the execution time of dyngen scales linearly w.r.t. the number of cells (Supplementary Note 2). Fixing the number of cells and varying the number of genes also showed that the execution time of dyngen scales linearly w.r.t. the number of genes (Supplementary Note 2). These timings were measured using 30 (out of 32) threads using a AMD Ryzen 9 5950X clocked at 3.4GHz.

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.