RaceID3 shows superior performance in recovering domains of marker gene expression in comparison to RCA1, SC32, and Seurat3, and ICGS4. RaceID3 was run with random forests-based reclassification (rf) and without. All clustering methods except for ICGS were run with different parameters to change sensitivity and obtain different cluster numbers (see Online methods). By this strategy, overlapping ranges of cluster numbers were obtained for each method. Only ICGS does not have a parameter to allow adjustment of the sensitivity. Shown is the maximum log2-transformed fold-enrichment (left panel) and the entropy of the distribution of average mean expressions (right panel) of a given lineage marker gene across all clusters detected as a function of cluster number. In comparison to all other tested methods, the fold-enrichment of the RaceID3 predictions is substantially higher for most lineage markers and of similar magnitude to the best performing methods for the other ones. At the same time, the entropy as a function of cluster number is consistently lower for most marker genes and follows a similar trend for the remaining ones, when comparing RaceID3 to the other methods. In conclusion, the benchmarking demonstrates that RaceID3 optimizes the overlap of known marker gene expression domains with predicted cell types.
1. Li, H. et al. Reference component analysis of single-cell transcriptomes elucidates cellular heterogeneity in human colorectal tumors. Nat. Genet. 49, 708–718 (2017).
2. Kiselev, V. Y. et al. SC3: consensus clustering of single-cell RNA-seq data. Nat. Methods 14, 483–486 (2017).
3. Satija, R., Farrell, J. A., Gennert, D., Schier, A. F. & Regev, A. Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33, 495–502 (2015).
4. Olsson, A. et al. Single-cell analysis of mixed-lineage states leading to a binary cell fate choice. Nature 537, 698–702 (2016).