Spearheading future omics analyses using dyngen, a multi-modal simulator of single cells

We present dyngen, a multi-modal simulation engine for studying dynamic cellular processes at single-cell resolution. dyngen is more flexible than current single-cell simulation engines, and allows better method development and benchmarking, thereby stimulating development and testing of computational methods. We demonstrate its potential for spearheading computational methods on three applications: aligning cell developmental trajectories, cell-specific regulatory network inference and estimation of RNA velocity.


Response to reviewers' comments dyngen rebuttal
We would like to thank the reviewer for the constructive comments, which has motivated us to greatly improve the experiments performed in this work and the documentation provided for the tool.Notable changes are: • We simplified and reran all use-case experiments, which allowed us to add some interpretation of the results for each of the use cases.• We performed new experiments assessing the scalability of dyngen and the similarity in output in comparison to a reference (real) dataset.The results are included in the supplementary material.• We created a documentation website available at dyngen.dynverse.org,which contains not only the documentation on how the package can be used, but also material created based on suggestions by the reviewer, namely expanded installation instructions and experiments on scalability and comparison to reference datasets.• We added conversion functions for outputting the dyngen output to one of the many commonly used single-cell omics formats: anndata, dyno, SingleCellExperiment, Seurat.

Comment 1
Bioinformatics has been looking into GRNs, their simulation and estimation for more than a decade (in bulk at the time), and I miss some important references and insights from then.Certainly simulating expression datasets using GRNs has been around for a long time and dyngen is inspired heavily by such software (GeneNetWeaver in particular).Nevertheless, there are great differences between bulk and single-cell simulators: the data characteristics of single-cell omics are vastly different from bulk omics, typically having much lower library sizes and a higher dropout rate, but also a high number of profiles.The low library sizes, in particular, are problematic as ODEs are ill-suited for performing low-molecule simulations.This necessitates the development of new single-cell simulators.
We have added paragraph §3 to the introduction to discuss the relevance of bulk simulators and why they are not suitable for simulating single-cell expression data: Simulating realistic data requires that the underlying biology is recapitulated as best as possible, and in the case of transcriptomics data this typically involves modelling the underlying gene regulatory networks.Simulators of "bulk" microarray or RNA-sequencing profiles simulate biological processes (e.g.transcription, translation) by translating a database of known regulatory interactions into a set of ordinary differential equations (ODE) [1,2,3,4].These methods have been instrumental in performing benchmarking studies [5,6,7].However, the advent of single-cell omics introduced several new types of analyses (e.g.trajectory inference, RNA velocity, cell-specific network inference) which exploit the higher resolution of single-cell versus bulk omics [8].In addition, the data characteristics of single-cell omics are vastly different from bulk omics, typically having much lower library sizes and a higher dropout rate, but also a high number of profiles [9].The low library sizes, in particular, are problematic as ODEs are ill-suited for performing low-molecule simulations [10].This necessitates the development of new single-cell simulators.

Comment 2
Similarly, the single cell genomics field has used simulations also for dynamics for some time as evaluation at least implicitely.Please discuss advances and differences or similarities to those, eg. to simulations from (Ocone, Bioinf 2015) or (Pratapa, Nat Meth 2020).
We expanded upon paragraph §4 and made a more direct comparison to the methods listed in Supplemental Table 1.Paragraph §4: To this end, single-cell omics simulators emulate the technical procedures from singlecell omics protocols.Simulators such as Splatter [11], powsimR [12], PROSSTT [13] and SymSim [14]) have already been widely used to compare single-cell methods [15,16,17,18] and perform independent benchmarks [19,20,21].However, by focusing more on simulating the single-cell omics protocol (e.g.RNA capture, amplification, sequencing) and less on the underlying biology (e.g.transcription, splicing, translation), their applicability and reusability is limited towards the specific application for which they were designed (e.g.benchmarking clustering or differential expression methods), and extending these tools to include additional modalities or experimental conditions is challenging.
We did not include the reference to Ocone Bioinf 2015 as the advances proposed by this research were mainly related to performing Network Inference.In terms of generating synthetic data, the simulator used by Ocone et al. offered little advantage over aforementioned bulk simulators.We also did not include a reference to Pratapa et al., as their methodology is borrowed heavily from dyngen itself and offers little advantage over it.

Comment 3
One of the main aims of any simulation framework should be to capture features of real biological data.I'd be great if the authors could demonstrate the fidelity of the generated data by comparing it to some real datasets, e.g.distribution of mean gene expression and number of zero counts per gene.How does it perform compared to other simulation frameworks?This would help me as well as community to evaluate how realistic the induced gene correlations/dependencies are, and what may be missing.
A vignette has been added to the package and supplementary material which uses countsimQC [22] to explore the characteristics of an dyngen simulation and the reference dataset used by it (dyngen.dynverse.org/articles/advanced_topics/comparison_reference.html).The distribution plots in this vignette, while not perfect, nicely demonstrates that the output generated by dyngen mimics the given reference dataset for the evaluated metrics.
We modified paragraphs §5 and §6 to better highlight the scientific contributions dyngen makes: We introduce dyngen, a method for simulating cellular dynamics at a single-cell, singletranscript resolution (Figure 1).This problem is tackled in three fully-configurable main steps.First, biological processes are mimicked by translating a gene regulatory network in a set of reactions (regulation, transcription, splicing, translation).Second, individual cells are simulated using Gillespie's stochastic simulation algorithm [10], which is designed to work well in low-molecule simulations.Finally, real reference datasets are used to emulate single-cell omics profiling protocols.
Throughout a simulation, dyngen tracks many layers of information, including the abundance of any molecule in the cell, the progression of the cell along a dynamic process, and the activation strength of individual regulatory interactions.In addition, dyngen can simulate a large variety of dynamic processes (e.g.cyclic, branching, disconnected) as well as a broad range of experimental conditions (e.g.batch effects and time-series, perturbation and single-cell knockdown experiments).For these reasons, dyngen can cater to a wide range of benchmarking applications, including trajectory inference, trajectory alignment, and trajectory differential expression (Table S1).
To the best of our knowledge, the works listed above use very different approaches to benchmarking their tool: • LIONESS, network inference method: Uses randomly generated networks & boolean networks to generate synthetic data.The networks used to generate datasets are very synthetic and completely devoid of any biological relevance.By using Gillespie's SSA algorithm instead of boolean networks, we can use dyngen to track many layers of information (e.g. the number of times a biological reaction has been triggered during a given time interval).This allows us to benchmark methods which require this information (e.g.RNA velocity).• cellAlign, trajectory alignment: Perturbs real datasets to assess the robustness of cellAlign.
When benchmarking a tool, assessing its robustness is almost as important as assessing its accuracy.Robustness analysis can be performed without knowing the gold standard, whereas the gold standard is needed for assessing the accuracy of a tool.• velocyto, RNA velocity: Uses ODEs to simulate splicing mechanics.The code for reproducing these datasets couldn't be found.Based on the description, these simulations were limited to simulating only the splicing mechanics under a particular set of conditions, thereby limiting its applicability outside of this particular context.• scvelo, RNA velocity: Same as velocyto.
• dynamo, RNA velocity: Uses Gillespie's SSA to simulate splicing reactions.However, this system is very limited, including only two genes.This application of Gillespie's SSA seems very tailored towards this particular use-case and would require major redesigns to be made applicable elsewhere.
It should be noted that from the list above, only LIONESS really compares its accuracy performance with other methods using a quantitative metric.We would thus argue that these cases further highlight the need for broadly applicable synthetic data generators such as dyngen.

Comment 5
The authors evaluate several computational tools, including the afore-mentioned.Even if the authors mention not to aim for a large-scale benchmarking, they still should provide some insights into why some tools perform better than others.Having an interpretable simulation engine in hand, it is unfortunate that it yet lacks any explanation and the reader is left alone with performance statements that are not well-founded.Some intuition would be very desirable.Why does SCENIC outperform the other methods?Why does smoothing improve trajectory alignment?And what causes differences in velocity estimation?Could you, for instance, generate an example where the differences in velocity estimates become more obvious, where one estimation procedure fails to fully capture the known direction?
Performing in-depth benchmarks on each of these use-cases can be considered separate studies on their own and is beyond the scope of this work.However, we did reanalyse the experiments and added interpretation to the respective sections.

Trajectory alignment
We removed DTW+smoothing (as it was a custom method developed for the sake of this analysis) and added cellAlign to the comparison, as cellAlign also uses smoothing and DTW to get to the same objective, and has been developed specifically with single cell data in mind.
Interpretation added to manuscript: Overall, cellAlign performs significantly better than DTW (Figure S1, which is likely due to the interpolation and scaling steps provided by cellAlign, reducing noise in the data and improving the comparability of the trajectories.Note that, in this comparison, only linear trajectory alignment is performed.While dyngen can generate non-linear trajectories (e.g.cyclic or branching), both aligning non-linear trajectories and constructing a quantitative accuracy metric for non-linear trajectory alignment is not trivial and an avenue for future work.

RNA velocity
We removed some parameter settings which are designed to be very similar and thus have no added benefit in the visualisation.We removed the 'difficulty' settings for the different datasets, as the 'easy' and 'medium' datasets were unrealistic; only the 'hard' settings were retained.While there are no large difference in performance between the different methods, we added interpretation on why methods obtain low scores on one metric and high scores on the other.
Interpretation added to manuscript: While both velocyto and scvelo obtained high scores for the velocity arrow cosine metric (overall 25th percentile = 0.606), the velocity correlation is rather low (overall 75th percentile = 0.156).This means that predicting the RNA velocity (i.e.transcription rate minus the decay rate) for any particular gene is challenging, but the combined information is very informative in determining the directionality of cell progression in the trajectory.In terms of velocity correlation, no method performed significantly better than the other, whereas "scvelo stochastic" performed slightly worse than "scvelo dynamical" and velocyto in terms of velocity arrow cosine score.

Cell-specific network inference
We added interpretation on why certain methods significantly outperform others and provide possible solutions to problems, where possible.
Interpretation added to manuscript: Comparing the mean AUROC and AUPR showed that pySCENIC significantly outperforms both LIONESS and SSN, and in turn that LIONESS significantly outperforms SSN.The poor performance of SSN is expected, as its methodology for predicting a cell-specific is simply computing the difference in Pearson correlation values applied to the whole dataset and the whole dataset minus one sample.This strategy performs poorly in large datasets where cell correlations are high, as the removal of one cell will not yield large differences in correlation values and will result in mostly noise.Overall, pySCENIC almost always performs better than LIONESS, except for a few datasets where LIONESS does manage to obtain a higher AUROC score.However, by using a different internal network inference (e.g.GENIE3 [23] or pySCENIC's GRNBoost2 [24]) could significantly increase the performance obtained by LIONESS.

Minor comment 1
Could the authors please discuss runtime and scalability, which obviously can easily become a weakness for such a complex model.
A vignette has been added to the package and supplementary material explores the execution time of dyngen for an increasing number of cells and genes (dyngen.dynverse.org/articles/advanced_topics/scalability_and_rThese figures show that the execution time of dyngen scales linearly w.r.t. the number of cells and also linearly w.r.t. the number of genes.

Minor comment 2
Please state how many genes have been used in the examples.It seems like the number of genes used in the examples is rather small (< 100).Wouldn't it be better to use a more realistic number of features (~10,000)?
Unfortunately, the examples shown in the package documentation need to be very small to abide by CRAN package submission policies, as the complete testing of the software package (including but not limited to running all examples) can strictly take 10 minutes on CRAN's server using only one thread.
However, the scalability vignette mentioned above also shows the execution time of each of the steps taken by dyngen in simulating a 10'000 by 10'000 dataset, which takes about 20 minutes to complete.

Minor comment 3
Congratulations for making the clearly documented package available.Installation and going through the vignette works fine.At several points during the simulation files are downloaded without asking permission.It may be better practice to ask the user for permission, or download into a temporary directory, or read the files directly without downloading, or include in the package etc.
By default, required files are downloaded to a temporary directory.Unfortunately, these files cannot be included inside the package as they are reference datasets which are much larger than what is allowed by CRAN policies.
Based on the reviewers suggestion, we modified the installation instructions such that it better reflects that files will be downloaded when using dyngen, and provides instructions on how to configure the system such that it will cache these files in a user-specified directory (dyngen.dynverse.org/articles/installation.html).

Minor comment 4
Further, it might be very useful to output the simulated data to a standard object, such as Seurat or SingleCellExperiment, or to provide instructions on how to convert it.
Thank you for this suggestion.We agree that conversion functions from the typical dyngen output to the typical object types familiar to single-cell omics programmers would be very useful.Therefore, we included such functions in the latest version of dyngen to convert its output to the following formats: anndata, dyno, SingleCellExperiment, Seurat.

REVIEWERS' COMMENTS
Reviewer #2 (Remarks to the Author): All points were addressed very well, thank you!In particular, the scalability analysis and improved documentation are useful for the community, and very appreciated.
I have two minor points left, which I think the authors can address quickly: -Results changed for velocity benchmarks -are these robust overall or is it due to the difficulty settings?-"This means that predicting the RNA velocity for any particular gene is challenging, but the combined information is very informative…".I agree that the combined information can be informative even if not all genes yield a robust fit.Yet, this is not necessarily because a per-gene fit is challenging, but imho rather due to low intrinsic dimensionality, as only a subset of genes is informative after all.It is also not clear, whether one can make that statement from the correlation metric as, e.g., if correlating two expression profiles, one would also generally obtain rather low correlation scores.