Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Analysis
  • Published:

Benchmarking spatial and single-cell transcriptomics integration methods for transcript distribution prediction and cell type deconvolution

Abstract

Spatial transcriptomics approaches have substantially advanced our capacity to detect the spatial distribution of RNA transcripts in tissues, yet it remains challenging to characterize whole-transcriptome-level data for single cells in space. Addressing this need, researchers have developed integration methods to combine spatial transcriptomic data with single-cell RNA-seq data to predict the spatial distribution of undetected transcripts and/or perform cell type deconvolution of spots in histological sections. However, to date, no independent studies have comparatively analyzed these integration methods to benchmark their performance. Here we present benchmarking of 16 integration methods using 45 paired datasets (comprising both spatial transcriptomics and scRNA-seq data) and 32 simulated datasets. We found that Tangram, gimVI, and SpaGE outperformed other integration methods for predicting the spatial distribution of RNA transcripts, whereas Cell2location, SpatialDWLS, and RCTD are the top-performing methods for the cell type deconvolution of spots. We provide a benchmark pipeline to help researchers select optimal integration methods to process their datasets.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Benchmarking workflow and summary characteristics of the examined paired datasets.
Fig. 2: Comparing the accuracy of eight integration methods capable of predicting the spatial distribution of RNA transcripts.
Fig. 3: Comparing the accuracy of the eight integration methods for sparse spatial expression matrices down-sampled from the original datasets using Splatter.
Fig. 4: Comparing the performance of the 12 integration methods capable of deconvoluting cell types of each histological spot.

Similar content being viewed by others

Data availability

A summary of the individual accession numbers is given in Supplementary Table 1. The raw data are available from following study:

Dataset 1 (mouse gastrulation): seqFISH, https://content.cruk.cam.ac.uk/jmlab/SpatialMouseAtlas2020/; 10X Chromium, ‘Sample 21’ in MouseGastrulationData within the R/Bioconductor data packageMouseGastrulationData.

Dataset 2 (mouse embryonic stem cell): seqFISH, https://zenodo.org/record/3735329#.YY69HZMza3J; Microwell-Seq, ‘EmbryonicStemCells’ in ‘MCA_BatchRemoved_Merge_dge.h5ad’ file in https://figshare.com/articles/dataset/MCA_DGE_Data/5435866.

Dataset 3 (mouse hippocampus): seqFISH, https://ars.els-cdn.com/content/image/1-s2.0-S0896627316307024-mmc6.xlsx; 10X Chromium, ‘HIPP_sc_Rep1_10X sample’ in GSE158450 in the GEO database.

Dataset 4 (mouse cortex): seqFISH+, https://github.com/CaiGroup/seqFISH-PLUS, and the spatial coordinate of each spot was generated using ‘stitchFieldCoordinates’ function in Giotto; Smart-seq, mouse primary visual cortex (VISp) in the dataset in https://portal.brain-map.org/atlases-and-data/rnaseq/mouse-v1-and-alm-smart-seq.

Dataset 5 (mouse olfactory bulb): seqFISH+, https://github.com/CaiGroup/seqFISH-PLUS; Drop-seq, GSE148360 in the GEO database.

Dataset 6 (mouse hypothalamic preoptic region): MERFISH, the eighteenth female parent mouse (animal ID = 18) in https://datadryad.org/stash/dataset/doi:10.5061/dryad.8t8s248; 10X Chromium, GSE113576 in the GEO database.

Dataset 7 (human osteosarcoma): MERFISH, the ‘B1_cell’ used in https://www.pnas.org/doi/suppl/10.1073/pnas.1912459116/suppl_file/pnas.1912459116.sd12.csv; 10X Chromium, BC22 in GSE152048 in the GEO database.

Dataset 8 (mouse primary motor cortex): MERFISH, ‘mouse1_slice162’ in https://caltech.box.com/shared/static/dzqt6ryytmjbgyai356s1z0phtnsbaol.gz; 10X Chromium, https://data.nemoarchive.org/biccn/lab/zeng/transcriptome/scell/10x_v3/mouse/processed/analysis/10X_cells_v3_AIBS/.

Dataset 9 (mouse VISP): MERFISH, https://github.com/spacetx-spacejam/data/; Smart-seq, mouse primary visual cortex (VISp) in https://portal.brain-map.org/atlases-and-data/rnaseq/mouse-v1-and-alm-smart-seq.

Dataset 10 (mouse visual cortex): STARmap, ‘20180505_BY3_1kgenes’ in https://www.starmapresources.com/data; Smart-seq, mouse primary visual cortex (VISp) in https://portal.brain-map.org/atlases-and-data/rnaseq/mouse-v1-and-alm-smart-seq.

Dataset 11 (mouse prefronatal cortex): STARmap, ‘20180419_BZ9_control’ in https://www.starmapresources.com/data; 10X Chromium, ‘PFC_sc_Rep2_10X’ in GSE158450 in the GEO database.

Dataset 12 (human middle temporal gyrus): ISS, https://github.com/spacetx-spacejam/data; Smart-seq, https://portal.brain-map.org/atlases-and-data/rnaseq/human-mtg-smart-seq.

Dataset 13 (mouse VISP): ISS, https://github.com/spacetx-spacejam/data; Smart-seq, mouse primary visual cortex (VISp) in the dataset in https://portal.brain-map.org/atlases-and-data/rnaseq/mouse-v1-and-alm-smart-seq.

Dataset 14 (Drosophila embryo): FISH, https://github.com/rajewsky-lab/distmap; Drop-seq, GSE95025 in the Gene Expression Omnibus (GEO) database.

Dataset 15 (mouse somatosensory cortex): osmFISH, cortical regions in http://linnarssonlab.org/osmFISH/; Smart-seq, mouse somatosensory cortex (SSp) in https://portal.brain-map.org/atlases-and-data/rnaseq/mouse-whole-cortex-and-hippocampus-smart-seq.

Dataset 16 (mouse VISP): BaristaSeq, https://github.com/spacetx-spacejam/data; Smart-seq, mouse primary visual cortex (VISp) in https://portal.brain-map.org/atlases-and-data/rnaseq/mouse-v1-and-alm-smart-seq.

Dataset 17 (mouse VISP): ExSeq, https://github.com/spacetx-spacejam/data; Smart-seq, mouse primary visual cortex (VISp) in https://portal.brain-map.org/atlases-and-data/rnaseq/mouse-v1-and-alm-smart-seq.

Dataset 18 (mouse hindlimb muscle): 10X Visium, Vis5A in GSE161318 in the GEO database; 10X Chromium, D2_Ev3 in GSE159500 in the GEO database.

Dataset 19 (mouse hindlimb muscle): 10X Visium, Vis9A in GSE161318 in the GEO database; 10X Chromium, D7_Ev3 in GSE159500 in the GEO database.

Dataset 20 (human breast cancer): 10X Visium, ‘CID3586’ in https://zenodo.org/record/4739739#.YY6N_pMzaWC; 10X Chromium, ‘CID3586’ in GSE176078 in the GEO database.

Dataset 21 (human breast cancer): 10X Visium, ‘1160920F’ in https://zenodo.org/record/4739739#.YY6N_pMzaWC; 10X Chromium, ‘CID3586’ in GSE176078 in the GEO database.

Dataset 22 (human breast cancer): 10X Visium, ‘CID4290’ in https://zenodo.org/record/4739739#.YY6N_pMzaWC; 10X Chromium, ‘CID3586’ in GSE176078 in the GEO database.

Dataset 23 (human breast cancer): 10X Visium, ‘CID4465’ in https://zenodo.org/record/4739739#.YY6N_pMzaWC; 10X Chromium, ‘CID3586’ in GSE176078 in the GEO database.

Dataset 24 (human breast cancer): 10X Visium, ‘CID44971’ https://zenodo.org/record/4739739#.YY6N_pMzaWC; 10X Chromium, ‘CID3586’ in GSE176078 in the GEO database.

Dataset 25 (human breast cancer): 10X Visium, ‘CID4535’ in https://zenodo.org/record/4739739#.YY6N_pMzaWC; 10X Chromium, ‘CID3586’ in GSE176078 in the GEO database.

Dataset 26 (zebrafish melanoma): 10X Visium, ‘Visium-A’ in GSE159709 in the GEO database; 10X Chromium, ‘SingleCell-E’ in GSE159709 in the GEO database.

Dataset 27 (mouse embryo): 10X Visium, ‘Visium-A1’ in GSE160137 in the GEO database; 10X Chromium, ‘Pax2-GFP_SC-2’ in GSE143806 in the GEO database.

Dataset 28 (human prostate): 10X Visium, ‘D25’ in GSE159697 in the GEO database; 10X Chromium, ‘V8’ in GSE142489 in the GEO database.

Dataset 29 (mouse kidney): 10X Visium, Sham Model in GSE171406 in the GEO database; 10X Chromium, wild-type sham mouse in GSE171639 in the GEO database.

Dataset 30 (mouse kidney): 10X Visium, ischemia reperfusion injury model in GSE171406 in the GEO database; 10X Chromium, wild-type ischemic acute kidney injury mouse in GSE171639 in the GEO database.

Dataset 31 (mouse brain): 10X Visium, ‘section1’ in GSE153424 in the GEO database; 10X Chromium, ‘brain1_cx’ in GSE153424 in the GEO database.

Dataset 32 (mouse prefrontal cortex): 10X Visium, ‘Visium_10X’ in GSE158450 in the GEO database; 10X Chromium, ‘PFC_sc_Rep1_10X’ in GSE158450 in the GEO database.

Dataset 33 (mouse hippocampus): 10X Visium, ‘Visium_10X’ in GSE158450 in the GEO database; 10X Chromium, ‘HIPP_sc_Rep1_10X’ in GSE158450 in the GEO database.

Dataset 34 (mouse kidney): 10X Visium, GSE154107 in the GEO database; 10X Chromium, sample ‘(LPS36hr) scRNA-seq’ in GSE151658 in the GEO database.

Dataset 35 (human prostate): 10X Visium, ‘ETOH’ in GSE159697 in the GEO database; 10X Chromium, ‘V8’ in GSE142489 in the GEO database.

Dataset 36 (mouse lymph node): 10X Visium, ‘PBS’ samples of Tissue 1 in https://github.com/romain-lopez/DestVI-reproducibility; 10X Chromium, ‘PBS’ samples in https://github.com/romain-lopez/DestVI-reproducibility.

Dataset 37 (mouse MCA205 tumor): 10X Visium, Tumor A1 of Tissue 1 in https://github.com/romain-lopez/DestVI-reproducibility; 10X Chromium, https://github.com/romain-lopez/DestVI-reproducibility.

Dataset 38 (mouse primary motor cortex): 10X Visium, https://storage.googleapis.com/tommaso-brain-data/tangram_demo/Allen-Visium_Allen1_cell_count.h5ad; 10X Chromium, ‘batch 9’ in ‘mop_sn_tutorial.h5ad’ file from https://console.cloud.google.com/storage/browser/tommaso-brain-data.

Dataset 39 (mouse primary motor cortex): Slide-seq, https://storage.googleapis.com/tommaso-brain-data/tangram_demo/slideseq_MOp_1217.h5ad.gz; 10X Chromium, ‘batch 9’ in ‘mop_sn_tutorial.h5ad’ file from https://console.cloud.google.com/storage/browser/tommaso-brain-data.

Dataset 40 (mouse cerebellum): Slide-seqV2, SCP948 in https://singlecell.broadinstitute.org/single_cell/; 10X Chromium, sample M003 of study SCP795 in https://singlecell.broadinstitute.org/single_cell/.

Dataset 41 (mouse hippocampus): Slide-seqV2, ‘Puck_200115_08’ in https://singlecell.broadinstitute.org/single_cell/study/SCP815/highly-sensitive-spatial-transcriptomics-at-near-cellular-resolution-with-slide-seqv2#study-download; Drop-seq, we randomly sampled 10,000 cells from ‘GSE116470_F_GRCm38.81.P60Hippocampus.raw.dge.txt.gz’ file in GSE116470 in the GEO database.

Dataset 42 (human squamous carcinoma): ST, GSM4284322 in the GEO database; 10X Chromium, ‘GSE144236_cSCC_counts.txt.gz’ in GSE144236 in the GEO database.

Dataset 43 (mouse hippocampus): ST, wild-type replicate 1 in https://data.mendeley.com/datasets/6s959w2zyr/1; 10X Chromium, GSE116470 in the GEO database.

Dataset 44 (mouse olfactory bulb): HDST, replicate1 in GSE130682 in the GEO database; 10X Chromium, WT1 samples used from GSE121891 in the GEO database.

Dataset 45 (mouse liver): Seq-scope, https://deepblue.lib.umich.edu/data/downloads/gx41mj14n; Smart-seq2, liver sample in GSE109774 in the GEO database.

We also provide an open source website for users to download all the above datasets: https://drive.google.com/drive/folders/1pHmE9cg_tMcouV1LFJFtbyBJNp7oQo9J?usp=sharing.

Source data for figures and Extended Data Figures are provided with this paper. Source data are provided with this paper.

Code availability

We uploaded the code and scripts used for the comparative analysis and figure plotting to GitHub: https://github.com/QuKunLab/SpatialBenchmarking. The package can also be used to analyze user’s own datasets.

References

  1. Maynard, K. R. et al. Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex. Nat. Neurosci. 24, 425–436 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Asp, M. et al. A spatiotemporal organ-wide gene expression and cell atlas of the developing human heart. Cell 179, 1647–1660 e1619 (2019).

    Article  CAS  PubMed  Google Scholar 

  3. Moncada, R. et al. Integrating microarray-based spatial transcriptomics and single-cell RNA-seq reveals tissue architecture in pancreatic ductal adenocarcinomas. Nat. Biotechnol. 38, 333–342 (2020).

    Article  CAS  PubMed  Google Scholar 

  4. Ji, A. L. et al. Multimodal analysis of composition and spatial architecture in human squamous cell carcinoma. Cell 182, 497–514 e422 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Eng, C. L. et al. Transcriptome-scale super-resolved imaging in tissues by RNA seqFISH. Nature 568, 235–239 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Codeluppi, S. et al. Spatial organization of the somatosensory cortex revealed by osmFISH. Nat. Methods 15, 932–935 (2018).

    Article  CAS  PubMed  Google Scholar 

  7. Moffitt, J. R. et al. Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region. Science 362, eaau5324 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  8. Stahl, P. L. et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science 353, 78–82 (2016).

    Article  CAS  PubMed  Google Scholar 

  9. Visium spatial gene expression (10x Genomics, 2020).

  10. Stickels, R. R. et al. Highly sensitive spatial transcriptomics at near-cellular resolution with Slide-seqV2. Nat. Biotechnol. 39, 313–319 (2021).

    Article  CAS  PubMed  Google Scholar 

  11. Rodriques, S. G. et al. Slide-seq: a scalable technology for measuring genome-wide expression at high spatial resolution. Science 363, 1463–1467 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Lopez, R. et al. A joint model of unpaired data from scRNA-seq and spatial transcriptomics for imputing missing gene expression measurements. ICML Workshop on Computational Biology (2019).

  13. Abdelaal, T., Mourragui, S., Mahfouz, A. & Reinders, M. J. T. SpaGE: spatial gene enhancement using scRNA-seq. Nucleic Acids Res. 48, e107 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Mourragui, S., Loog, M., van de Wiel, M. A., Reinders, M. J. T. & Wessels, L. F. A. PRECISE: a domain adaptation approach to transfer predictors of drug response from pre-clinical models to tumors. Bioinformatics 35, i510–i519 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Biancalani, T. et al. Deep learning and alignment of spatially resolved single-cell transcriptomes with Tangram. Nat. Methods 18, 1352–1362 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  16. Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902 e1821 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Welch, J. D. et al. Single-Cell Multi-omic integration compares and contrasts features of brain cell identity. Cell 177, 1873–1887 e1817 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Yang, Z. & Michailidis, G. A non-negative matrix factorization method for detecting modules in heterogeneous omics multi-modal data. Bioinformatics 32, 1–8 (2016).

    PubMed  Google Scholar 

  20. Nitzan, M., Karaiskos, N., Friedman, N. & Rajewsky, N. Gene expression cartography. Nature 576, 132–137 (2019).

    Article  CAS  PubMed  Google Scholar 

  21. Cang, Z. & Nie, Q. Inferring spatial and signaling relationships between cells from single cell transcriptomic data. Nat. Commun. 11, 2084 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Villani, C. Optimal Transport: Old and New Vol. 338 (Springer, 2009).

  23. Chen, S. Q., Zhang, B. H., Chen, X. Y., Zhang, X. G. & Jiang, R. stPlus: a reference-based method for the accurate enhancement of spatial transcriptomics. Bioinformatics 37, I299–I307 (2021).

    Article  CAS  Google Scholar 

  24. Kleshchevnikov, V. et al. Cell2location maps fine-grained cell types in spatial transcriptomics. Nat Biotechnol. 1-11, https://doi.org/10.1038/s41587-021-01139-4 (2022).

  25. Cable, D. M. et al. Robust decomposition of cell type mixtures in spatial transcriptomics. Nat. Biotechnol. 40, 517–526 (2021).

    Article  PubMed  CAS  Google Scholar 

  26. Dong, R. & Yuan, G. C. SpatialDWLS: accurate deconvolution of spatial transcriptomic data. Genome Biol. 22, 145 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  27. Andersson, A. et al. Single-cell and spatial transcriptomics enables probabilistic inference of cell type topography. Commun. Biol. 3, 565 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  28. Elosua-Bayes, M., Nieto, P., Mereu, E., Gut, I. & Heyn, H. SPOTlight: seeded NMF regression to deconvolute spatial transcriptomics spots with single-cell transcriptomes. Nucleic Acids Res. 49, e50 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Song, Q. Q. & Su, J. DSTG: deconvoluting spatial transcriptomics data through graph-based artificial intelligence. Brief. Bioinform. 22, bbaa414 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  30. Sun, D., Liu, Z., Li, T., Wu, Q. & Wang, C. STRIDE: accurately decomposing and integrating spatial transcriptomics using single-cell RNA sequencing. Nucleic Acids Res. gkac150 (2022).

  31. Lopez, R. et al. Multi-resolution deconvolution of spatial transcriptomics data reveals continuous patterns of inflammation. Nat. Biotechnol. in press (2022).

  32. Karaiskos, N. et al. The Drosophila embryo at single-cell transcriptome resolution. Science 358, 194–199 (2017).

    Article  CAS  PubMed  Google Scholar 

  33. Berkeley Drosophila Transcription Network Project. http://bdtnp.lbl.gov:8080/Fly-Net/.

  34. Tasic, B. et al. Shared and distinct transcriptomic cell types across neocortical areas. Nature 563, 72–78 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Shah, S., Lubeck, E., Zhou, W. & Cai, L. In situ transcription profiling of single cells reveals spatial organization of cells in the mouse hippocampus. Neuron 92, 342–357 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Xia, C., Fan, J., Emanuel, G., Hao, J. & Zhuang, X. Spatial transcriptome profiling by MERFISH reveals subcellular RNA compartmentalization and cell cycle-dependent gene expression. Proc. Natl Acad. Sci. USA 116, 19490–19499 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Wang, X. et al. Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science 361, eaat5691 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  38. Joglekar, A. et al. A spatially resolved brain region- and cell type-specific isoform atlas of the postnatal mouse brain. Nat. Commun. 12, 463 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Navarro, J. F. et al. Spatial transcriptomics reveals genes associated with dysregulated mitochondrial functions and stress signaling in alzheimer disease. iScience 23, 101556 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Lohoff, T. et al. Integration of spatial and single-cell transcriptomic data elucidates mouse organogenesis. Nat. Biotechnol. 40, 74–85 (2022).

    Article  CAS  PubMed  Google Scholar 

  41. Nowotschin, S. et al. The emergent landscape of the mouse gut endoderm at single-cell resolution. Nature 569, 361–367 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Han, X. et al. Mapping the mouse cell atlas by microwell-Seq. Cell 172, 1091–1107 e1017 (2018).

    Article  CAS  PubMed  Google Scholar 

  43. Pijuan-Sala, B. et al. A single-cell molecular map of mouse gastrulation and early organogenesis. Nature 566, 490–495 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Brann, D. H. et al. Non-neuronal expression of SARS-CoV-2 entry genes in the olfactory system suggests mechanisms underlying COVID-19-associated anosmia. Sci. Adv. 6, eabc5801 (2020).

    Article  CAS  PubMed  Google Scholar 

  45. Cho, C. S. et al. Microscopic examination of spatial transcriptome using Seq-Scope. Cell 184, 3559–3572 e3522 (2021).

    Article  CAS  PubMed  Google Scholar 

  46. Tabula Muris, C. et al. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature 562, 367–372 (2018).

    Article  CAS  Google Scholar 

  47. Vickovic, S. et al. High-definition spatial transcriptomics for in situ tissue profiling. Nat. Methods 16, 987–990 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Saunders, A. et al. Molecular diversity and specializations among the cells of the adult mouse brain. Cell 174, 1015–1030 e1016 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. McCray, T. et al. Erratum: Vitamin D sufficiency enhances differentiation of patient-derived prostate epithelial organoids. iScience 24, 102640 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Janosevic, D. et al. The orchestrated cellular and molecular responses of the kidney to endotoxin define a precise sepsis timeline. eLife 10, e62270 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Melo Ferreira, R. et al. Integration of spatial and single-cell transcriptomics localizes epithelial cell–immune cross-talk in kidney injury. JCI Insight 6, e147703 (2021).

    Article  Google Scholar 

  52. Sanchez-Ferras, O. et al. A coordinated progression of progenitor cell states initiates urinary tract development. Nat. Commun. 12, 2627 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Wu, S. Z. et al. A single-cell and spatially resolved atlas of human breast cancers. Nat. Genet. 53, 1334–1347 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Alon, S. et al. Expansion sequencing: spatially precise in situ transcriptomics in intact biological systems. Science 371, 481 (2021).

    Article  CAS  Google Scholar 

  55. Chen, X., Sun, Y. C., Church, G. M., Lee, J. H. & Zador, A. M. Efficient in situ barcode sequencing using padlock probe-based BaristaSeq. Nucleic Acids Res. 46, e22 (2018).

    Article  CAS  PubMed  Google Scholar 

  56. Booeshaghi, A. S. et al. Isoform cell-type specificity in the mouse primary motor cortex. Nature 598, 195–199 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Hodge, R. D. et al. Conserved cell types with divergent features in human versus mouse cortex. Nature 573, 61–68 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Tepe, B. et al. Single-cell RNA-seq of mouse olfactory bulb reveals cellular heterogeneity and activity-dependent molecular census of adult-born neurons. Cell Rep. 25, 2689–2703 e2683 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Hunter, M. V., Moncada, R., Weiss, J. M., Yanai, I. & White, R. M. Spatially resolved transcriptomics reveals the architecture of the tumor-microenvironment interface. Nat. Commun. 12, 6278 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. McKellar, D. W. et al. Large-scale integration of single-cell transcriptomic data captures transitional progenitor states in mouse skeletal muscle regeneration. Commun. Biol. 4, 1280 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Ratz, M. et al. Clonal relations in the mouse brain revealed by single-cell and spatial transcriptomics. Nat. Neurosci. 25, 285–294 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Macosko, E. Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Ramskold, D. et al. Full-length mRNA-seq from single-cell levels of RNA and individual circulating tumor cells. Nat. Biotechnol. 30, 777–782 (2012).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  64. Zheng, G. X. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Zappia, L., Phipson, B. & Oshlack, A. Splatter: simulation of single-cell RNA sequencing data. Genome Biol. 18, 174 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  66. McCarthy, D. J., Campbell, K. R., Lun, A. T. L. & Wills, Q. F. Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics 33, 1179–1186 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  67. Drew, C. J. G., Kyd, R. J. & Morton, A. J. Complexin 1 knockout mice exhibit marked deficits in social behaviours but appear to be cognitively normal. Hum. Mol. Genet. 16, 2288–2305 (2007).

    Article  CAS  PubMed  Google Scholar 

  68. Huang, M. et al. SAVER: gene expression recovery for single-cell RNA sequencing. Nat. Methods 15, 539–542 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. van Dijk, D. et al. Recovering gene interactions from single-cell data using data diffusion. Cell 174, 716–729 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  70. Hu, Y. et al. WEDGE: imputation of gene expression values from single-cell RNA-seq datasets using biased matrix decomposition. Brief Bioinform 22, bbab085 (2021).

    Article  PubMed  CAS  Google Scholar 

  71. Dries, R. et al. Giotto: a toolbox for integrative analysis and visualization of spatial expression data. Genome Biol. 22, 78 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Jin, S. Q. et al. Inference and analysis of cell-cell communication using CellChat. Nat. Commun. 12, 1088 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Browaeys, R., Saelens, W. & Saeys, Y. NicheNet: modeling intercellular communication by linking ligands to target genes. Nat. Methods 17, 159–162 (2020).

    Article  CAS  PubMed  Google Scholar 

  74. Noel, F. et al. Dissection of intercellular communication using the transcriptome-based framework ICELLNET. Nat. Commun. 12, 1089 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Cabello-Aguilar, S. et al. SingleCellSignalR: inference of intercellular networks from single-cell transcriptomics. Nucleic Acids Res. 48, e55 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Chen, A. et al. Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball patterned arrays. Preprint at bioRxiv https://doi.org/10.1101/2021.01.17.427004 (2021).

  77. Wang, Z., Bovik, A. C., Sheikh, H. R. & Simoncelli, E. P. Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13, 600–612 (2004).

    Article  PubMed  Google Scholar 

  78. Lin, J. Divergence measures based on the Shannon entropy. IEEE Trans. Inf. Theory 37, 145–151 (1991).

    Article  Google Scholar 

  79. Tasic, B. et al. Adult mouse cortical cell taxonomy revealed by single cell transcriptomics. Nat. Neurosci. 19, 335–346 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Zeisel, A. et al. Brain structure. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq. Science 347, 1138–1142 (2015).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Key R&D Program of China (2020YFA0112200 to K.Q.), the National Natural Science Foundation of China grants (T2125012, 91940306, 31970858, and 31771428 to K.Q.; 32170668 to B.L.; 81871479 to J.L.), CAS Project for Young Scientists in Basic Research YSBR-005 (to K.Q.), Anhui Province Science and Technology Key Program (202003a07020021 to K.Q.) and the Fundamental Research Funds for the Central Universities (YD2070002019, WK9110000141, and WK2070000158 to K.Q.; WK9100000001 to J.L). We thank the USTC supercomputing center and the School of Life Science Bioinformatics Center for providing computing resources for this project.

Author information

Authors and Affiliations

Authors

Contributions

K.Q. and B.L. conceived the project. B.L., W.Z. and C.G. designed the framework and performed data analysis with help from H.X., L.L., M.F, Y.H., X.Y., X.Z., F.C. and T.X., X.Z., M.T., K.L., J.L. and L.C. contributed to revision of the manuscript. B.L., K.Q., C.G., and W.Z. wrote the manuscript with inputs from all authors. K.Q. supervised the entire project. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Kun Qu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks Ahmed Mahfouz and the other, anonymous, reviewer for their contribution to the peer review of this work. Primary Handling editor: Lin Tang, in collaboration with the Nature Methods team. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Comparing the accuracy of eight integration methods in predicting the spatial distribution of RNA transcripts.

a, The spatial distribution of COL17A1 in dataset 42 (ST; 10X Chromium; human squamous carcinoma), including the ground truth and prediction results from the integration methods. PCC: Pearson Correlation Coefficient between the expression vector of a transcript in the ground truth and that of the predicted result. b, Bar plots of PCC, SSIM, RMSE, and JS of each integration method in predicting the spatial distribution of transcripts of dataset 42. SSIM: Structural Similarity Index; RMSE: Root Mean Square Error; JS: Jensen-Shannon divergence. Data are presented as mean values ± 95% confidence intervals; n = 948 predicted genes. c, The violin plot of AS (accuracy score, aggregated from PCC, SSIM, RMSE, and JS; see Methods) of the eight integration methods for transcripts in dataset 42. Center line, median; box limits, upper and lower quartiles; whiskers, 1.5× interquartile range; n = 4 benchmark metrics.

Source data

Extended Data Fig. 2 The boxplots of PCC, SSIM, RMSE, JS values of each integration method in predicting the spatial distribution of RNA transcripts of 45 paired spatial transcriptomics and scRNA-seq datasets.

The boxplots of PCC, SSIM, RMSE, JS values of each integration method in predicting the spatial distribution of RNA transcripts of 45 paired spatial transcriptomics and scRNA-seq datasets. Center line, median; box limits, upper and lower quartiles; whiskers, 0.5× interquartile range, the number of genes for each dataset is shown at the top of each panel.

Source data

Extended Data Fig. 3 PCC, SSIM, RMSE, JS and AS of spatial distribution of RNA transcripts predicted by each integration method for the 45 paired spatial transcriptomics and scRNA-seq datasets.

a-g, Boxplots of AS (accuracy score, aggregated from PCC, SSIM, RMSE, and JS; see Methods) of the integration methods for transcripts in the 17 image-based datasets (a), 28 seq-based datasets (b), 32 simulated datasets (c), 21 10X visium datasets (d), 5 seqFISH datasets (e), 4 MERFISH datasets (f), 3 Slide-seq datasets (g). Center line, median; box limits, upper and lower quartiles; whiskers, 1.5× interquartile range. Grey dots indicate the prediction result is not available, as the tool made an error when predictions.

Source data

Extended Data Fig. 4 The PCC values of each integration method when processing the raw expression matrices and the normalized expression matrices.

The PCC values of each integration method when processing the raw expression matrices and the normalized expression matrices. R-R: raw expression matrix of spatial data and raw expression matrix of scRNA-seq data; N-R: normalized expression matrix of spatial data and raw expression matrix of scRNA-seq data; R-N: raw expression matrix of spatial data and normalized expression matrix of scRNA-seq data; N-N: normalized expression matrix of spatial data and normalized expression matrix of scRNA-seq data; n = 43 independent datasets. Dataset6 and Dataset8 are excluded, as the normalized expression matrix of spatial data has been normalized.

Source data

Extended Data Fig. 5 Impact of normalization on the accuracy of eight integration methods that can predict the spatial distribution of RNA transcripts.

a, b, Boxplots of the PCC values of the eight integration methods for 28 seq-based datasets (a) or 15 image-based datasets (b) when using the four schemes of input expression matrices (that is R-R, R-N, N-R, and N-N, see their definition in the legend of Extended Data Fig. 4). For the genes predicted by each method, we removed outliers using 10%-90% confidence interval. Statistical significance was analyzed with two-sided paired t-test, *P < 0.05, **P < 0.01, ***P < 0.001 and ****P < 0.0001. Center line, median; box limits, upper and lower quartiles; whiskers, 1.5× interquartile range. c-f, Boxplots of the AS values of the eight integration methods for all the 45 paired datasets when using the four schemes of input expression matrices. For the genes predicted by each method, we removed outliers using 10%-90% confidence interval. Center line, median; box limits, upper and lower quartiles; whiskers, 1.5× interquartile range; n = 43 independent datasets.

Source data

Extended Data Fig. 6 Correlation between the four metrics (PCC, SSIM, RMSE, and JS) and the sparsity of each examined spatial expression matrix.

Correlation between the four metrics (PCC, SSIM, RMSE, and JS) and the sparsity of each examined spatial expression matrix. For all the eight integration methods that can predict the spatial distribution of transcripts, the JS values are linearly positively correlated with the sparsity of expression matrices of the spatial transcriptomics data (R2 ≥ 0.50).

Source data

Extended Data Fig. 7 Comparing the accuracy of the eight integration methods for sparse expression matrices down-sampled from the original datasets using Scuttle.

a, Spatial distribution of Cplx1 expression in dataset 4 (seqFISH+; Smart-seq; mouse cortex), predicted from the original data and down-sampled data (down-sampling rate = 0.8). b, PCC of the spatial distribution of transcripts predicted from the original data and down-sampled data from dataset 4. The PCC values of the red-colored transcripts are greater than 0.5 for both the original and the down-sampled data. The proportion of the red-colored transcripts in all transcripts was defined as the ‘robustness score’ (RS). c, RS values of the eight integration methods when processing sparse expression matrices down-sampled from dataset 4 at different down-sampling rates. d, RS values of the eight integration methods when processing the sparse expression matrices of the down-sampled datasets. The original datasets (used to generate the down-sampled datasets) capture >1000 genes from >100 spots, and the sparsity of the expression matrices is <0.7. Data are presented as mean values ± 95% confidence intervals; n = 19 independent datasets.

Source data

Extended Data Fig. 8 Comparing the performance of the twelve integration methods in cell type deconvolution.

a, PCC, SSIM, RMSE, and JS values for the cell type composition of the spots simulated from dataset 10, generated by twelve integration methods. Center line, median; box limits, upper and lower quartiles; whiskers, 1.5× interquartile range; n = 12 predicted cell types. b, A seqFISH+ slide of dataset 4 (seqFISH+; Smart-seq; mouse cortex) with cells annotated by cell type. Each grid represents a simulated spot containing 1~18 cells. c, The proportion of L5&6 excitatory neurons in the spots simulated from dataset 4, including the ground truth and the predicted results of twelve integration methods. d, PCC, SSIM, RMSE, and JS values for the cell type composition of the spots simulated from dataset 4, generated by twelve integration methods. Center line, median; box limits, upper and lower quartiles; whiskers, 1.5× interquartile range; n = 8 predicted cell types. e, PCC, SSIM, RMSE, and JS values for the cell type composition of the spots in all the simulated datasets (n = 32), generated by ten integration methods. SpaOTsc and novoSpaRc are excluded, as they require spatial location information for each spot, which is not available in the simulated datasets. Data are presented as mean values ± 95% confidence intervals; n = 32 independent datasets.

Source data

Extended Data Fig. 9 Computer resources consumed by each integration method.

a-c, The impact of the number of cells in scRNA-seq data (a), the number of spots in spatial data (b), and the number of genes used for training (c), on computational resources consumed by the integration methods that can predict the spatial distribution of undetected transcripts. d, The computer time and memory spent by the integration methods that can deconvolute cell types of histological spots, when processing a simulated dataset which contains 20000 spots in its spatial transcriptomics data and 10000 cells in its scRNA-seq data. e-g, The impacts of the number of cells in scRNA-seq data (e), the number of spots in spatial data (f), and the number of the cell types (g) on computational resources consumed by the integration methods that can deconvolute cell types of histological spots.

Source data

Supplementary information

Source data

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, B., Zhang, W., Guo, C. et al. Benchmarking spatial and single-cell transcriptomics integration methods for transcript distribution prediction and cell type deconvolution. Nat Methods 19, 662–670 (2022). https://doi.org/10.1038/s41592-022-01480-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-022-01480-9

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing