Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Analysis
  • Published:

Benchmarking algorithms for single-cell multi-omics prediction and integration

Abstract

The development of single-cell multi-omics technology has greatly enhanced our understanding of biology, and in parallel, numerous algorithms have been proposed to predict the protein abundance and/or chromatin accessibility of cells from single-cell transcriptomic information and to integrate various types of single-cell multi-omics data. However, few studies have systematically compared and evaluated the performance of these algorithms. Here, we present a benchmark study of 14 protein abundance/chromatin accessibility prediction algorithms and 18 single-cell multi-omics integration algorithms using 47 single-cell multi-omics datasets. Our benchmark study showed overall totalVI and scArches outperformed the other algorithms for predicting protein abundance, and LS_Lab was the top-performing algorithm for the prediction of chromatin accessibility in most cases. Seurat, MOJITOO and scAI emerge as leading algorithms for vertical integration, whereas totalVI and UINMF excel beyond their counterparts in both horizontal and mosaic integration scenarios. Additionally, we provide a pipeline to assist researchers in selecting the optimal multi-omics prediction and integration algorithm.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Workflow and multi-omics datasets for benchmarking.
Fig. 2: Performance of 11 algorithms in predicting protein abundance from RNA expression.
Fig. 3: Performance of nine algorithms in predicting chromatin accessibility information from RNA expression.
Fig. 4: Benchmarking results for vertical integration.
Fig. 5: Benchmarking results for horizontal integration.
Fig. 6: Benchmarking results for mosaic integration.

Similar content being viewed by others

Data availability

A summary of the multi-omics datasets used in the benchmark study, including the sequencing technologies and the websites where the raw data are available as follows: dataset 1 (human BMMCs): CITE-seq, GSE128639 (ref. 5); dataset 2 (human BMMCs): CITE-seq, GSE194122 (ref. 79); dataset 3 (human brain immune cells): CITE-seq, GSE201048 (ref. 80); dataset 4 (human CBMCs): CITE-seq, GSE100866 (ref. 1); dataset 5 (human glioblastomas): CITE-seq, GSM4972212 (ref. 81); dataset 6 (mouse glioblastomas): CITE-seq, GSE163120 (ref. 81); dataset 7 (mouse HSPCs): CITE-seq, GSE175702 (ref. 82); dataset 8 (human MALT tumor): CITE-seq, https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.0/malt_10k_protein_v3; dataset 9–10 (mouse murine splenic myeloid cells): CITE-seq, GSE149544 (ref. 83); dataset 11 (mouse naive brains): CITE-seq, GSE148127 (ref. 84); dataset 12–13 (human PBMCs): CITE-seq, GSE164378 (ref. 5); dataset 14–15 (human PBMCs): CITE-seq, https://zenodo.org/record/6348128#.Y5f40LJBzDU (ref. 30); dataset 21–22 (mouse spleen and lymph nodes): CITE-seq, GSE150599 (ref. 6); dataset 23–24 (human PBMCs): REAP-seq, GSE100501 (ref. 2); dataset 25–26 and dataset 40–41 (human PBMCs): DOGMA-seq, GSE156478 (ref. 18); datasets 27 and 42 (human PBMCs): TEA-seq, GSE158013 (ref. 71); dataset 28 (human PBMCs): inCITE-seq, GSE163480 (ref. 85); dataset 29 (skin of mouse): SHARE-seq, GSE140203 (ref. 3); dataset 30 (adult brain of mouse): SHARE-seq, GSE140203 (ref. 3); dataset 31 (adult brain of mouse): SNARE-seq, GSE126074 (ref. 4); dataset 32 (adult brain of mouse): ISSAAC-seq, https://www.ebi.ac.uk/biostudies/arrayexpress/studies/E-MTAB-11264/ (ref. 12); dataset 33 (adult brain of mouse): 10x Multiome, https://www.10xgenomics.com/resources/datasets/frozen-human-healthy-brain-tissue-3-k-1-standard-2-0-0/; dataset 34 (10,000 PBMCs with granulocytes removed): 10x Multiome, https://www.10xgenomics.com/resources/datasets/pbmc-from-a-healthy-donor-granulocytes-removed-through-cell-sorting-10-k-1-standard-2-0-0/; dataset 35 (3,000 PBMCs with granulocytes removed): 10x Multiome, https://www.10xgenomics.com/resources/datasets/pbmc-from-a-healthy-donor-granulocytes-removed-through-cell-sorting-3-k-1-standard-2-0-0/; dataset 36 (10,000 PBMCs): 10x Multiome, https://www.10xgenomics.com/resources/datasets/pbmc-from-a-healthy-donor-no-cell-sorting-10-k-1-standard-2-0-0/; dataset 37 (3,000 PBMCs): 10x Multiome, https://www.10xgenomics.com/resources/datasets/pbmc-from-a-healthy-donor-no-cell-sorting-3-k-1-standard-2-0-0/; dataset 38 (mouse retina): 10x Multiome, GSE201402 (ref. 86); dataset 39 (human BMMCs): 10x Multiome, GSE194122 (ref. 79); dataset 43 (mouse spleen): scRNA-seq, GSE132901 (ref. 87); dataset 44 (mouse retain): scRNA-seq, GSE181251 (ref. 88); dataset 45 (mouse adult brain): scRNA-seq, GSE246147 (ref. 89); dataset 46 (mouse HSPCs): scRNA-seq, GSE175702 (ref. 82); dataset 47 (mouse retain): scATAC-seq, GSE181251 (ref. 88). Source data are provided with this paper.

Code availability

We have uploaded the codes and scripts used for the benchmark study and figure plotting to a GitHub website, which can be accessed at https://github.com/QuKunLab/MultiomeBenchmarking/. Code is also available in the Zenodo repository via https://doi.org/10.5281/zenodo.10540843 (ref. 90).

References

  1. Stoeckius, M. et al. Simultaneous epitope and transcriptome measurement in single cells. Nat. Methods 14, 865–868 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Peterson, V. M. et al. Multiplexed quantification of proteins and transcripts in single cells. Nat. Biotechnol. 35, 936–939 (2017).

    Article  CAS  PubMed  Google Scholar 

  3. Ma, S. et al. Chromatin potential identified by shared single-cell profiling of RNA and chromatin. Cell 183, 1103–1116.e20 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Chen, S., Lake, B. B. & Zhang, K. High-throughput sequencing of the transcriptome and chromatin accessibility in the same cell. Nat. Biotechnol. 37, 1452–1457 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587.e29 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Gayoso, A. et al. Joint probabilistic modeling of single-cell multi-omic data with totalVI. Nat. Methods 18, 272–282 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Zhang, L., Zhang, J. & Nie, Q. DIRECT-NET: an efficient method to discover cis-regulatory elements and construct regulatory networks from single-cell multiomics data. Sci. Adv. 8, eabl7393 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Kartha, V. K. et al. Functional inference of gene regulation using single-cell multi-omics. Cell Genom. 2, 100166 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Li, C., Virgilio, M. C., Collins, K. L. & Welch, J. D. Multi-omic single-cell velocity models epigenome–transcriptome interactions and improves cell fate prediction. Nat. Biotechnol. 41, 387–398 (2023).

    Article  CAS  PubMed  Google Scholar 

  10. Gorin, G., Svensson, V. & Pachter, L. Protein velocity and acceleration from single-cell multiomics experiments. Genome Biol. 21, 39 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. La Manno, G. et al. RNA velocity of single cells. Nature 560, 494–498 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  12. Xu, W. et al. ISSAAC-seq enables sensitive and flexible multimodal profiling of chromatin accessibility and gene expression in single cells. Nat. Methods 19, 1243–1249 (2022).

    Article  CAS  PubMed  Google Scholar 

  13. Zhou, Z., Ye, C., Wang, J. & Zhang, N. R. Surface protein imputation from single cell transcriptomes by deep neural networks. Nat. Commun. 11, 651 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Bennett, H. M., Stephenson, W., Rose, C. M. & Darmanis, S. Single-cell proteomics enabled by next-generation sequencing or mass spectrometry. Nat. Methods 20, 363–374 (2023).

    Article  CAS  PubMed  Google Scholar 

  15. Gatto, L. et al. Initial recommendations for performing, benchmarking and reporting single-cell proteomics experiments. Nat. Methods 20, 375–386 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Lance, C. et al. Multimodal single cell data integration challenge: results and lessons learned. In Proc. NeurIPS 2021 Competitions and Demonstrations Track (eds. Kiela, D. et al.) 162–176 (PMLR, 2022).

  17. Bartosovic, M., Kabbe, M. & Castelo-Branco, G. Single-cell CUT&Tag profiles histone modifications and transcription factors in complex tissues. Nat. Biotechnol. 39, 825–835 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Mimitou, E. P. et al. Scalable, multimodal profiling of chromatin accessibility, gene expression and protein levels in single cells. Nat. Biotechnol. 39, 1246–1258 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902.e21 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Welch, J. D. et al. Single-cell multi-omic integration compares and contrasts features of brain cell identity. Cell 177, 1873–887.e17 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Lotfollahi, M. et al. Mapping single-cell data to reference atlases by transfer learning. Nat. Biotechnol. 40, 121–130 (2022).

    Article  CAS  PubMed  Google Scholar 

  22. Ashuach, T. et al. MultiVI: deep generative model for the integration of multimodal data. Nat. Methods 20, 1222–1231 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Lakkis, J. et al. A multi-use deep learning method for CITE-seq and single-cell RNA-seq data integration with cell surface protein prediction and imputation. Nat. Mach. Intell. 4, 940–952 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  24. Wu, K. E., Yost, K. E., Chang, H. Y. & Zou, J. BABEL enables cross-modality translation between multiomic profiles at single-cell resolution. Proc. Natl Acad. Sci. USA 118, e2023070118 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Du, J.-H., Cai, Z. & Roeder, K. Robust probabilistic modeling for single-cell multimodal mosaic integration and imputation via scVAEIT. Proc. Natl Acad. Sci. USA 119, e2214414119 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Lan, M., Zhang, S. & Gao, L. Efficient generation of paired single-cell multiomics profiles by deep learning. Adv. Sci 10, 2301169 (2023).

    Article  CAS  Google Scholar 

  27. Wen, H. et al. Proc. 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (Association for Computing Machinery, 2022).

  28. Yang, K. D. et al. Multi-domain translation between single-cell imaging and sequencing data using autoencoders. Nat. Commun. 12, 31 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Baysoy, A., Bai, Z., Satija, R. & Fan, R. The technological landscape and applications of single-cell multi-omics. Nat. Rev. Mol. Cell Biol. 24, 695–713 (2023).

    Article  CAS  PubMed  Google Scholar 

  30. Cheng, M., Li, Z. & Costa, I. G. MOJITOO: a fast and universal method for integration of multimodal single-cell data. Bioinformatics 38, i282–i289 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  31. Lotfollahi, M., Litinetskaya, A. & Theis, F. J. Multigrate: single-cell multi-omic data integration. Preprint at bioRxiv https://doi.org/10.1101/2022.03.16.484643 (2022).

  32. Wang, R. H., Wang, J. & Li, S. C. Probabilistic tensor decomposition extracts better latent embeddings from single-cell multiomic data. Nucleic Acids Res. 51, e81 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Kim, H. J., Lin, Y., Geddes, T. A., Yang, J. Y. H. & Yang, P. CiteFuse enables multi-modal analysis of CITE-seq data. Bioinformatics 36, 4137–4143 (2020).

    Article  CAS  PubMed  Google Scholar 

  34. Ma, A. et al. Single-cell biological network inference using a heterogeneous graph transformer. Nat. Commun. 14, 964 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Jin, S., Zhang, L. & Nie, Q. scAI: an unsupervised approach for the integrative analysis of parallel single-cell transcriptomic and epigenomic profiles. Genome Biol. 21, 25 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  36. Argelaguet, R. et al. MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data. Genome Biol. 21, 111 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  37. Li, G. et al. A deep generative model for multi-view profiling of single-cell RNA-seq and ATAC-seq data. Genome Biol. 23, 20 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Lynch, A. W. et al. MIRA: joint regulatory modeling of multimodal expression and chromatin accessibility in single cells. Nat. Methods 19, 1097–1108 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Singh, R., Hie, B. L., Narayan, A. & Berger, B. Schema: metric learning enables interpretable synthesis of heterogeneous single-cell modalities. Genome Biol. 22, 131 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  40. Kriebel, A. R. & Welch, J. D. UINMF performs mosaic integration of single-cell multi-omic datasets using nonnegative matrix factorization. Nat. Commun. 13, 780 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Zhang, Z. et al. scMoMaT jointly performs single cell mosaic integration and multi-modal bio-marker detection. Nat. Commun. 14, 384 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  42. Ghazanfar, S., Guibentif, C. & Marioni, J. C. Stabilized mosaic single-cell data integration using unshared features. Nat. Biotechnol. 42, 284–292 (2024).

    Article  CAS  PubMed  Google Scholar 

  43. De Biasi, S. et al. Circulating mucosal-associated invariant T cells identify patients responding to anti-PD-1 therapy. Nat. Commun. 12, 1669 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  44. Heumos, L. et al. Best practices for single-cell analysis across modalities. Nat. Rev. Genet. 24, 550–572 (2023).

    Article  CAS  PubMed  Google Scholar 

  45. Miao, Z., Humphreys, B. D., McMahon, A. P. & Kim, J. Multi-omics integration in the age of million single-cell data. Nat. Rev. Nephrol. 17, 710–724 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  46. Argelaguet, R., Cuomo, A. S. E., Stegle, O. & Marioni, J. C. Computational principles and challenges in single-cell data integration. Nat. Biotechnol. 39, 1202–1215 (2021).

    Article  CAS  PubMed  Google Scholar 

  47. Wang, J. et al. Data denoising with transfer learning in single-cell transcriptomics. Nat. Methods 16, 875–878 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Huang, M. et al. SAVER: gene expression recovery for single-cell RNA sequencing. Nat. Methods 15, 539–542 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Hu, Y. et al. WEDGE: imputation of gene expression values from single-cell RNA-seq datasets using biased matrix decomposition. Brief. Bioinform. 22, bbab085 (2021).

    Article  PubMed  Google Scholar 

  50. Truong, K.-L. et al. Killer-like receptors and GPR56 progressive expression defines cytokine production of human CD4+ memory T cells. Nat. Commun. 10, 2263 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  51. Fergusson, J. R. et al. CD161intCD8+ T cells: a novel population of highly functional, memory CD8+ T cells enriched within the gut. Mucosal Immunol. 9, 401–413 (2016).

    Article  CAS  PubMed  Google Scholar 

  52. Kung, P. C., Goldstein, G., Reinherz, E. L. & Schlossman, S. F. Monoclonal antibodies defining distinctive human T cell surface antigens. Science 206, 347–349 (1979).

    Article  CAS  PubMed  Google Scholar 

  53. Liang, Y. & Tedder, T. F. Identification of a CD20-, FcϵRIβ-, and HTm4-Related gene family: sixteen new MS4A family members expressed in human and mouse. Genomics 72, 119–127 (2001).

    Article  CAS  PubMed  Google Scholar 

  54. Ziegler-Heitbrock, H. W. L. & Ulevitch, R. J. CD14: cell surface receptor and differentiation marker. Immunol. Today 14, 121–125 (1993).

    Article  CAS  PubMed  Google Scholar 

  55. Stuart, T., Srivastava, A., Madad, S., Lareau, C. A. & Satija, R. Single-cell chromatin state analysis with Signac. Nat. Methods 18, 1333–1341 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Spitz, F. & Furlong, E. E. M. Transcription factors: from enhancer binding to developmental control. Nat. Rev. Genet. 13, 613–626 (2012).

    Article  CAS  PubMed  Google Scholar 

  57. Gertz, J. et al. Distinct properties of cell-type-specific and shared transcription factor binding sites. Mol. Cell 52, 25–36 (2013).

    Article  CAS  PubMed  Google Scholar 

  58. Kang, R. et al. EnhancerDB: a resource of transcriptional regulation in the context of enhancers. Database 2019, bay141 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  59. Buergel, T. et al. Metabolomic profiles predict individual multidisease outcomes. Nat. Med. 28, 2309–2320 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Lewis, S. M. et al. Spatial omics and multiplexed imaging to explore cancer biology. Nat. Methods 18, 997–1012 (2021).

    Article  CAS  PubMed  Google Scholar 

  61. Li, B. et al. Benchmarking spatial and single-cell transcriptomics integration methods for transcript distribution prediction and cell type deconvolution. Nat. Methods 19, 662–670 (2022).

    Article  CAS  PubMed  Google Scholar 

  62. Linderman, G. C. et al. Zero-preserving imputation of single-cell RNA-seq data. Nat. Commun. 13, 192 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Yuan, H. & Kelley, D. R. scBasset: sequence-based modeling of single-cell ATAC-seq using convolutional neural networks. Nat. Methods 19, 1088–1096 (2022).

    Article  CAS  PubMed  Google Scholar 

  64. Chen, A. et al. Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell 185, 1777–1792.e21 (2022).

    Article  CAS  PubMed  Google Scholar 

  65. Chen, K. H., Boettiger, A. N., Moffitt, J. R., Wang, S. & Zhuang, X. Spatially resolved, highly multiplexed RNA profiling in single cells. Science 348, aaa6090 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  66. Su, G. et al. Spatial multi-omics sequencing for fixed tissue via DBiT-seq. STAR Protoc. 2, 100532 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Liu, Y. et al. High-plex protein and whole transcriptome co-mapping at cellular resolution with spatial CITE-seq. Nat. Biotechnol. 41, 1405–1409 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618, 616–624 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Cui, H. et al. scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nat. Methods 21, 1470–1480 (2024).

    Article  CAS  PubMed  Google Scholar 

  70. Hao, M. et al. Large-scale foundation model on single-cell transcriptomics. Nat. Methods 21, 1481–1491 (2024).

    Article  CAS  PubMed  Google Scholar 

  71. Swanson, E. et al. Simultaneous trimodal single-cell measurement of transcripts, epitopes, and chromatin accessibility using TEA-seq. eLife 10, e63632 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Hand, D. J. & Till, R. J. A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach. Learn. 45, 171–186 (2001).

    Article  Google Scholar 

  73. Hubert, L. & Arabie, P. Comparing partitions. J. Classif. 2, 193–218 (1985).

    Article  Google Scholar 

  74. Strehl, A. & Ghosh, J. Cluster ensembles–a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002).

    Google Scholar 

  75. Rousseeuw, P. J. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987).

    Article  Google Scholar 

  76. Luecken, M. D. et al. Benchmarking atlas-level data integration in single-cell genomics. Nat. Methods 19, 41–50 (2022).

    Article  CAS  PubMed  Google Scholar 

  77. Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 16, 1289–1296 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Büttner, M., Miao, Z., Wolf, F. A., Teichmann, S. A. & Theis, F. J. A test metric for assessing single-cell RNA-seq batch correction. Nat. Methods 16, 43–49 (2019).

    Article  PubMed  Google Scholar 

  79. Luecken, M. D. et al. A sandbox for prediction and integration of DNA, RNA, and proteins in single cells. In Proc. Neural Information Processing Systems Track on Datasets and Benchmarks (eds. Vanschoren, J. & Yeung, S.) 13 (NeurIPS, 2021).

  80. Kumar, P. et al. Single-cell transcriptomics and surface epitope detection in human brain epileptic lesions identifies pro-inflammatory signaling. Nat. Neurosci. 25, 956–966 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Pombo Antunes, A. R. et al. Single-cell profiling of myeloid cells in glioblastoma across species and disease stage reveals macrophage competition and specialization. Nat. Neurosci. 24, 595–610 (2021).

    Article  CAS  PubMed  Google Scholar 

  82. Konturek-Ciesla, A. et al. Temporal multimodal single-cell profiling of native hematopoiesis illuminates altered differentiation trajectories with age. Cell Rep. 42, 112304 (2023).

    Article  CAS  PubMed  Google Scholar 

  83. Lukowski, S. W. et al. Absence of Batf3 reveals a new dimension of cell state heterogeneity within conventional dendritic cells. iScience 24, 102402 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Golomb, S. M. et al. Multi-modal single-cell analysis reveals brain immune landscape plasticity during aging and gut microbiota dysbiosis. Cell Rep. 33, 108438 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  85. Chung, H. et al. Joint single-cell measurements of nuclear proteins and RNA in vivo. Nat. Methods 18, 1204–1212 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. Dou, J. et al. Bi-order multimodal integration of single-cell data. Genome Biol. 23, 112 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  87. Kimmel, J. C. et al. Murine single-cell RNA-seq reveals cell-identity-and tissue-specific trajectories of aging. Genome Res. 29, 2088–2103 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  88. Lyu, P. et al. Gene regulatory networks controlling temporal patterning, neurogenesis, and cell-fate specification in mammalian retina. Cell Rep. 37, 109994 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. Sun, W. et al. Spatial transcriptomics reveal neuron–astrocyte synergy in long-term memory. Nature 627, 374–381 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  90. Hu, Y. et al. Benchmarking algorithms for single-cell multi-omics prediction and integration. Zenodo https://doi.org/10.5281/zenodo.10540843 (2024).

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China grants (T2125012 to K.Q.), the National Key R&D Program of China (2020YFA0112200 and 2022YFA1303200 to K.Q.), the National Natural Science Foundation of China grants (32170668 to B.L.; 12371383 and 61972368 to F.C.), CAS Project for Young Scientists in Basic Research YSBR-005 (to K.Q.), Anhui Province Science and Technology Key Program (202003a07020021 to K.Q.), the Fundamental Research Funds for the Central Universities (YD2070002019, WK9110000141 and WK2070000158 to K.Q.; WK0010000085 to Y.H.), Anhui Provincial Natural Science Foundation (2308085QA07 to Y.H.) and China Postdoctoral Science Foundation (2023M733383 to Y.H.). We thank the USTC supercomputing center and the School of Life Science Bioinformatics Center for providing computing resources for this project.

Author information

Authors and Affiliations

Authors

Contributions

K.Q., B.L. and F.C. conceived the project. Y.H., S.W. and Y. Luo designed the framework and performed data analysis with help from T.W., S.J., Y.Z., N.L. and Z.Y. Y. Li, W.D. and C.J. contributed in the revision. B.L., Y.H. and K.Q. wrote the manuscript with input from all authors. K.Q. supervised the entire project. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Falai Chen, Bin Li or Kun Qu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks Jinmiao Chen and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editors: Hui Hua and Lin Tang, in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Performance of eleven algorithms in predicting RC protein abundance from RNA expression.

a, b, Average PCC (a) and CMD (b) values between the reference and predicted RC protein expression for the intra-dataset scenario, that is, the training and test sets are from the same datasets. The X and Y axes are the cell‒cell and protein‒protein PCC/CMD, respectively, and the dashed lines are the medians of all algorithms’ results. Error bar: standard deviation of 23 datasets. Data are presented as mean values +/- 0.5xSD. c, d, Same as (a) and (b), but the results were predicted for the inter-dataset scenario, that is, the training and test sets are from different datasets. Error bar: standard deviation of 10 datasets. e, Average RMSE values between the reference data and the predicted results for the intra-dataset scenario (X axes) and inter-dataset scenario (Y axes). Error bars: standard deviation of 23 datasets (X axes) or 10 datasets (Y axes). Data are presented as mean values +/− 0.5xSD. f, g, Rank index (RI) values of eleven algorithms in the intra-dataset (f) and inter-dataset (g) scenarios. h, The overall performance of eleven algorithms in both intra-dataset and inter-dataset scenarios. Source data for this figure are provided.

Source data

Extended Data Fig. 2 Performance of eleven algorithms in predicting RU protein abundance from RNA expression.

a, b, Average PCC (a) and CMD (b) values between the reference and predicted RU protein abundance for the intra-dataset scenario, that is, the training and test sets are from the same datasets. The X and Y axes are the cell‒cell and protein‒protein PCC/CMD, respectively, and the dashed lines are the medians of all algorithms’ results. Error bar: standard deviation of 23 datasets. Data are presented as mean values +/− 0.5xSD. c, d, Same as (a) and (b), but the results were predicted for the inter-dataset scenario, that is, the training and test sets are from different datasets. Error bar: standard deviation of 10 datasets. e, Average RMSE values between the reference data and the predicted results for the intra-dataset scenario (X axes) and inter-dataset scenario (Y axes). Error bars: standard deviation of 23 datasets (X axes) or 10 datasets (Y axes). Data are presented as mean values +/− 0.5xSD. f, g, Rank index (RI) values of seven algorithms in the intra-dataset (f) and inter-dataset (g) scenarios. h, The overall performance of seven algorithms in both intra-dataset and inter-dataset scenarios. Source data for this figure are provided.

Source data

Extended Data Fig. 3 Performance of nine chromatin accessibility prediction algorithms when converting peaks to DORCs.

a, b, Average PCC (b) and CMD (c) values between the reference data and the predicted results for the intra-dataset scenario, that is, the training and test sets are from the same datasets. The X and Y axes are the cell‒cell and DORC-DORC PCC/CMD axes, respectively, and the dashed lines are the medians of all algorithms’ results. Error bar: standard deviation of 11 datasets. Data are presented as mean values +/− 0.5xSD. c, Average RMSE values between the reference data and the predicted results for the intra-dataset scenario (X axes) and inter-dataset scenario (Y axes). Error bar: standard deviation of 11 datasets (X axes) or 8 datasets (Y axes). Data are presented as mean values +/− 0.5xSD. d, e, Same as (a) and (b), but the results were predicted for the inter-dataset scenario, that is, the training and test sets are from different datasets. Error bar: standard deviation of 8 datasets. f, g, Rank index (RI) values of nine algorithms in the intra-dataset (e) and inter-dataset (f) scenarios. h, The overall performance of nine algorithms in both intra-dataset and inter-dataset scenarios. Source data for this figure are provided.

Source data

Extended Data Fig. 4 Performance of nine chromatin accessibility prediction algorithms when using smoothed ATAC-seq matrix.

a, b, Average PCC (b) and CMD (c) values between the KNN-smoothing reference data and the predicted results for the intra-dataset scenario, that is, the training and test sets are from the same datasets. The X and Y axes are the cell‒cell and peak-peak PCC/CMD, respectively, and the dashed lines are the medians of all the algorithm results. Error bar: standard deviation of 11 datasets. Data are presented as mean values +/− 0.5xSD. c, Average RMSE values between the KNN-smoothing reference data and the predicted results for the intra-dataset scenario (X axes) and inter-dataset scenario (Y axes). Error bar: standard deviation of 11 datasets (X axes) or 8 datasets (Y axes). Data are presented as mean values +/− 0.5xSD. d, e, Same as (a) and (b), but the results were predicted for the inter-dataset scenario, that is, the training and test sets are from different datasets. Error bar: standard deviation of 8 datasets. f, g, Rank index (RI) values of nine algorithms in the intra-dataset (f) and inter-dataset (g) scenarios. h, The overall performance of nine algorithms in both intra-dataset and inter-dataset scenarios. Source data for this figure are provided.

Source data

Extended Data Fig. 5 Computational resources consumed by the fourteen multi-omics prediction algorithms.

a, b, The computational time and memory cost of eleven algorithms for predicting protein abundance in datasets with different numbers of cells. Guanlab-dengkw and scArches reported memory errors and stopped when processing the dataset with 500k cells. Error bar: standard deviation of 5 down-samplings and 2 tests. Data are presented as mean values +/− 0.5xSD. c, d, The computer time and memory cost of nine algorithms for predicting chromatin accessibility in datasets with different numbers of cells. Error bar: standard deviation of 5 down-samplings and 2 tests. Data are presented as mean values +/− 0.5xSD. Source data for this figure are provided.

Source data

Extended Data Fig. 6 Computational resources consumed by eighteen single-cell multi-omics integration algorithms.

a, Computer time and memory used by nine vertical integration algorithms when integrating RNA expression and protein abundance for datasets with different numbers of cells. CiteFuse reported memory errors and stopped when processing datasets with over 20k cells. Error bar: standard deviation of 5 down-samplings and 2 tests. Data are presented as mean values +/− 0.5xSD. b, Same as (a), but the results were generated by twelve vertical integration algorithms when integrating RNA expression and chromatin accessibility. scAI reported memory errors and stopped when processing datasets with over 20k cells. c, Computer time and memory cost of five horizontal integration algorithms when integrating single-cell RNA+Protein data for datasets with different numbers of cells. Error bar: standard deviation of 5 down-samplings and 2 tests. Data are presented as mean values +/− 0.5xSD. d, Same as (c), but the results were generated by seven horizontal integration algorithms when integrating single-cell RNA + ATAC data. e, Computer time and memory cost of seven mosaic integration algorithms when integrating scRNA-seq and single-cell RNA+Protein data for datasets with different numbers of cells. Error bar: standard deviation of 5 down-samplings and 2 tests. Data are presented as mean values +/− 0.5xSD. f-h, Same as (e), but the results were generated by mosaic integration algorithms when integrating scRNA-seq data and single-cell RNA + ATAC data (b), integrating scATAC-seq data and single-cell RNA + ATAC data (c), and integrating single-cell RNA+Protein data and single-cell RNA + ATAC data (d). Source data for this figure are provided.

Source data

Extended Data Fig. 7 Summary of the performance of the fourteen multi-omics prediction algorithms.

The figure shows: (i) the properties of these algorithms, including the programming languages, methodologies, and GPU acceleration requirements. (ii) the overall performance of these algorithms, evaluated by six metrics in both the inter- and intra-scenarios. A lighter color (and/or a larger dot) indicates better performance for a given metrics. (iii) the computer time and memory consumed by these algorithms for different sizes of datasets; ‘NA’ indicates a memory error or invalid result. Source data for this figure are provided.

Source data

Extended Data Fig. 8 Summary of the performance of the fifteen vertical integration algorithms.

The figure shows: (i) the properties of these algorithms, including the programming languages, methodologies, and GPU acceleration requirements; (ii) the overall performance of these algorithms, evaluated by four metrics. (iii) the computer time and memory consumed by these algorithms for different sizes of datasets; ‘NA’ indicates a memory error or invalid result. Source data for this figure are provided.

Source data

Extended Data Fig. 9 Summary of the performance of nine horizontal integration algorithms.

The figure shows: (i) the properties of these algorithms, including the programming languages, methodologies, and GPU acceleration requirements; (ii) the overall performance of these algorithms, evaluated by ten metrics in both the inter- and intra-scenarios. (iii) the computer time and memory consumed by these algorithms for different sizes of datasets; ‘NA’ indicates a memory error or invalid result. Source data for this figure are provided.

Source data

Extended Data Fig. 10 Summary of the performance of eight mosaic integration algorithms.

The figure shows: (i) the properties of these algorithms, including the programming languages, methodologies, and GPU acceleration requirements; (ii) the overall performance of these algorithms, evaluated by ten metrics in both the inter- and intra-scenarios. (iii) the computer time and memory consumed by these algorithms for different sizes of datasets; ‘NA’ indicates a memory error or invalid result. Source data for this figure are provided.

Source data

Supplementary information

Supplementary Information

Supplementary Figs. 1–59

Reporting Summary

Peer Review File

Supplementary Tables 1–9

Supplementary Table 1: Multi-omics prediction algorithm properties. Supplementary Table 2: Detailed information of 47 multi-omics datasets. Supplementary Table 3: Quality-control parameters for 39 multi-omics datasets used for prediction algorithms. Supplementary Table 4: Vertical integration algorithm properties. Supplementary Table 5: Horizontal integration algorithm properties. Supplementary Table 6: Mosaic integration algorithm properties. Supplementary Table 7: Detailed information of 24 single-cell multi-omics datasets for vertical integration. Supplementary Table 8: Detailed information of 19 single-cell multi-omics data groups used for benchmarking horizontal integration algorithms. Supplementary Table 9: Detailed information of 55 paired datasets used for benchmarking mosaic integration algorithms.

Source data

Source Data Fig. 1

Statistical source data.

Source Data Fig. 2

Statistical source data.

Source Data Fig. 3

Statistical source data.

Source Data Fig. 4

Statistical source data.

Source Data Fig. 5

Statistical source data.

Source Data Fig. 6

Statistical source data.

Source Data Extended Data Fig. 1

Statistical source data.

Source Data Extended Data Fig. 2

Statistical source data.

Source Data Extended Data Fig. 3

Statistical source data.

Source Data Extended Data Fig. 4

Statistical source data.

Source Data Extended Data Fig. 5

Statistical source data.

Source Data Extended Data Fig. 6

Statistical source data.

Source Data Extended Data Fig. 7

Statistical source data.

Source Data Extended Data Fig. 8

Statistical source data.

Source Data Extended Data Fig. 9

Statistical source data.

Source Data Extended Data Fig. 10

Statistical source data.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hu, Y., Wan, S., Luo, Y. et al. Benchmarking algorithms for single-cell multi-omics prediction and integration. Nat Methods (2024). https://doi.org/10.1038/s41592-024-02429-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41592-024-02429-w

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics