Abstract
Spatial clustering, which shares an analogy with single-cell clustering, has expanded the scope of tissue physiology studies from cell-centroid to structure-centroid with spatially resolved transcriptomics (SRT) data. Computational methods have undergone remarkable development in recent years, but a comprehensive benchmark study is still lacking. Here we present a benchmark study of 13 computational methods on 34 SRT data (7 datasets). The performance was evaluated on the basis of accuracy, spatial continuity, marker genes detection, scalability, and robustness. We found existing methods were complementary in terms of their performance and functionality, and we provide guidance for selecting appropriate methods for given scenarios. On testing additional 22 challenging datasets, we identified challenges in identifying noncontinuous spatial domains and limitations of existing methods, highlighting their inadequacies in handling recent large-scale tasks. Furthermore, with 145 simulated data, we examined the robustness of these methods against four different factors, and assessed the impact of pre- and postprocessing approaches. Our study offers a comprehensive evaluation of existing spatial clustering methods with SRT data, paving the way for future advancements in this rapidly evolving field.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
Data1 to Data12 were downloaded from ref. 64. Data13 to Data21 are available from ref. 65. Data22 to Data24 were downloaded from ref. 66. Data25 to Data29 were downloaded from ref. 67. Data30 was downloaded from ref. 68. Data31 to Data33 are available from ref. 69. Data34 was downloaded from ref. 69. Data35 to Data41 were downloaded from ref. 70. Data42 to Data54 were downloaded from https://www.livercellatlas.org/. Data55 to Data56 are available at GSE111672. Data57 to Data87 were downloaded from ref. 71. Source data are provided with this paper.
Code availability
The code and scripts used for data preprocessing and visualization are available at https://github.com/zhaofangyuan98/SDMBench. Our benchmarking workflow is provided as a reproducible pipeline at https://github.com/zhaofangyuan98/SDMBench/tree/main/SDMBench. We also provide a tutorial at https://github.com/zhaofangyuan98/SDMBench/tree/main/Tutorial.
References
Vandereyken, K., Sifrim, A., Thienpont, B. & Voet, T. Methods and applications for single-cell and spatial multi-omics. Nat. Rev. Genet. https://doi.org/10.1038/s41576-023-00580-2 (2023).
Seferbekova, Z., Lomakin, A., Yates, L. R. & Gerstung, M. Spatial biology of cancer evolution. Nat. Rev. Genet. https://doi.org/10.1038/s41576-022-00553-x (2022).
Moffitt, J. R., Lundberg, E. & Heyn, H. The emerging landscape of spatial profiling technologies. Nat. Rev. Genet. https://doi.org/10.1038/s41576-022-00515-3 (2022).
Zeng, H. et al. Spatially resolved single-cell translatomics at molecular resolution. Science 380, eadd3067 (2023).
Shi, H. et al. Spatial atlas of the mouse central nervous system at molecular resolution. Nature https://doi.org/10.1038/s41586-023-06569-5 (2023).
Chen, A. et al. Single-cell spatial transcriptome reveals cell-type organization in the macaque cortex. Cell 186, 3726–3743 e3724 (2023).
Zhang, M. et al. Spatially resolved cell atlas of the mouse primary motor cortex by MERFISH. Nature 598, 137–143 (2021).
Zhang, M. et al. Molecularly defined and spatially resolved cell atlas of the whole mouse brain. Nature 624, 343–354 (2023).
Chang, Y. et al. Define and visualize pathological architectures of human tissues from spatially resolved transcriptomics using deep learning. Comput. Struct. Biotechnol. J. 20, 4600–4617 (2022).
Dong, K. & Zhang, S. Deciphering spatial domains from spatially resolved transcriptomics with adaptive graph attention auto-encoder. Nat. Commun. https://doi.org/10.1038/s41467-022-29439-6 (2021).
Fu, H. et al. Unsupervised spatial embedded deep representation of spatial transcriptomics. Preprint at bioRxiv https://doi.org/10.1101/2021.06.15.448542 (2021).
Hu, J. et al. SpaGCN: integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nat. Methods 18, 1342–1351 (2021).
Li, J., Chen, S., Pan, X., Yuan, Y. & Shen, H.-B. Cell clustering for spatial transcriptomics data with graph neural networks. Nat. Comput. Sci. 2, 399–408 (2022).
Yuan, Z. et al. SOTIP is a versatile method for microenvironment modeling with spatial omics data. Nat. Commun. 13, 7330 (2022).
Yang, M. et al. Position-informed contrastive learning for spatially resolved omics deciphers hierarchical tissue structure at both cellular and niche levels. Preprint at Research Square https://doi.org/10.21203/rs.3.rs-1067780/v1 (2022).
Cable, D. M. et al. Cell type-specific inference of differential expression in spatial transcriptomics. Nat. Methods 19, 1076–1087 (2022).
Zeng, H. et al. Integrative in situ mapping of single-cell transcriptional states and tissue histopathology in a mouse model of Alzheimer’s disease. Nat. Neurosci. https://doi.org/10.1038/s41593-022-01251-x (2023).
Palla, G., Fischer, D. S., Regev, A. & Theis, F. J. Spatial components of molecular tissue biology. Nat. Biotechnol. https://doi.org/10.1038/s41587-021-01182-1 (2022).
Rao, A., Barkley, D., Franca, G. S. & Yanai, I. Exploring tissue architecture using spatial transcriptomics. Nature 596, 211–220 (2021).
Cheng, A., Hu, G. & Li, W. V. Benchmarking cell-type clustering methods for spatially resolved transcriptomics data. Brief. Bioinform. 24, bbac475 (2023).
Xu, Z. et al. STOmicsDB: a comprehensive database for spatial transcriptomics data sharing, analysis and visualization. Nucleic Acids Res. 52, D1053–D1061 (2024).
Long, B., Miller, J. & The SpaceTx Consortium. SpaceTx: a roadmap for benchmarking spatial transcriptomics exploration of the brain. Preprint at https://arxiv.org/abs/2301.08436 (2023).
Megill, C. et al. Cellxgene: a performant, scalable exploration platform for high dimensional sparse matrices. Preprint at bioRxiv https://doi.org/10.1101/2021.04.05.438318 (2021).
Fan, Z., Chen, R. & Chen, X. SpatialDB: a database for spatially resolved transcriptomes. Nucleic Acids Res. https://doi.org/10.1093/nar/gkz934 (2019).
Maynard, K. R. et al. Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex. Nat. Neurosci. 24, 425–436 (2021).
Yuan, Z. et al. SODB facilitates comprehensive exploration of spatial omics data. Nat. Methods https://doi.org/10.1038/s41592-023-01773-7 (2023).
Chen, A. et al. Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell 185, 1777–1792 e1721 (2022).
Chen, X., Sun, Y.-C., Church, G. M., Lee, J. H. & Zador, A. M. Efficient in situ barcode sequencing using padlock probe-based BaristaSeq. Nucleic Acids Res. 46, e22 (2018).
Chen, K. H., Boettiger, A. N., Moffitt, J. R., Wang, S. Y. & Zhuang, X. W. Spatially resolved, highly multiplexed RNA profiling in single cells. Science https://doi.org/10.1126/science.aaa6090 (2015).
Codeluppi, S. et al. Spatial organization of the somatosensory cortex revealed by osmFISH. Nat. Methods 15, 932–935 (2018).
Wang, X. et al. Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science 361, eaat5691 (2018).
Ren, H., Walker, B. L., Cang, Z. & Nie, Q. Identifying multicellular spatiotemporal organization of cells with SpaceFlow. Nat. Commun. 13, 4076 (2022).
Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).
Traag, V. A., Waltman, L. & Van Eck, N. J. From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep. 9, 5233 (2019).
Zhao, E. et al. Spatial transcriptomics at subspot resolution with BayesSpace. Nat. Biotechnol. https://doi.org/10.1038/s41587-021-00935-2 (2021).
Pham, D. et al. Robust mapping of spatiotemporal trajectories and cell–cell interactions in healthy and diseased tissues. Nat. Commun. 14, 7739 (2023).
Zixuan, C., Ning., X., Nie, A., Xu, M. & Zhang, J. SCAN-IT: domain segmentation of spatial transcriptomics images by graph neural network. In 32nd British Machine Vision Conference https://www.bmvc2021-virtualconference.com/conference/papers/paper_1139.html (2021).
Dong, K. & Zhang, S. Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder. Nat. Commun. 13, 1739 (2022).
Zong, Y. et al. conST: an interpretable multi-modal contrastive learning framework for spatial transcriptomics. Preprint at bioRxiv https://doi.org/10.1101/2022.01.14.476408 (2022).
Li, Z. & Zhou, X. BASS: multi-scale and multi-sample analysis enables accurate cell type clustering and spatial domain detection in spatial transcriptomic studies. Genome Biol. 23, 168 (2022).
Long, Y. et al. Spatially informed clustering, integration, and deconvolution of spatial transcriptomics with GraphST. Nat. Commun. 14, 1155 (2023).
Li, B. et al. Benchmarking spatial and single-cell transcriptomics integration methods for transcript distribution prediction and cell type deconvolution. Nat. Methods 19, 662–670 (2022).
Moses, L. & Pachter, L. Museum of spatial transcriptomics. Nat. Methods 19, 534–546 (2022).
Rosenberg, A. & Hirschberg, J. V-measure: a conditional entropy-based external cluster evaluation measure. In Proc. 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) 410–420 (2007).
Shang, L. & Zhou, X. Spatially aware dimension reduction for spatial transcriptomics. Nat. Commun. 13, 7203 (2022).
Zuo, C. et al. Elucidating tumor heterogeneity from spatially resolved transcriptomics data by multi-view graph collaborative learning. Nat. Commun. 13, 5962 (2022).
Moran, P. A. Notes on continuous stochastic phenomena. Biometrika 37, 17–23 (1950).
Geary, R. C. The contiguity ratio and statistical mapping. Incorp. Stat. 5, 115–146 (1954).
Fang, R. et al. Conservation and divergence of cortical cell organization in human and mouse revealed by MERFISH. Science 377, 56–62 (2022).
Moffitt, J. R. et al. Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region. Science 362, eaau5324 (2018).
Andersson, A. et al. Spatial deconvolution of HER2-positive breast cancer delineates tumor-associated cell type interactions. Nat. Commun. 12, 6012 (2021).
Guilliams, M. et al. Spatial proteogenomics reveals distinct and evolutionarily conserved hepatic macrophage niches. Cell 185, 379–396. e338 (2022).
Moncada, R. et al. Integrating microarray-based spatial transcriptomics and single-cell RNA-seq reveals tissue architecture in pancreatic ductal adenocarcinomas. Nat. Biotechnol. 38, 333–342 (2020).
Lohoff, T. et al. Integration of spatial and single-cell transcriptomic data elucidates mouse organogenesis. Nat. Biotechnol. https://doi.org/10.1038/s41587-021-01006-2 (2021).
Allen, W. E., Blosser, T. R., Sullivan, Z. A., Dulac, C. & Zhuang, X. Molecular and spatial signatures of mouse brain aging at single-cell resolution. Cell https://doi.org/10.1016/j.cell.2022.12.010 (2022).
Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 16, 1289–1296 (2019).
Luecken, M. D. et al. Benchmarking atlas-level data integration in single-cell genomics. Nat. Methods 19, 41–50 (2022).
Saelens, W., Cannoodt, R., Todorov, H. & Saeys, Y. A comparison of single-cell trajectory inference methods. Nat. Biotechnol. 37, 547–554 (2019).
Wolf, F. A. et al. Louvain usage in Scanpy. Scanpy https://scanpy.readthedocs.io/en/stable/generated/scanpy.tl.louvain.html (2018).
Wolf, F. A. et al. Leiden usage in Scanpy. Scanpy https://scanpy.readthedocs.io/en/stable/generated/scanpy.tl.leiden.html (2018).
Hao, M., Hua, K. & Zhang, X. SOMDE: a scalable method for identifying spatially variable genes with self-organizing map. Bioinformatics https://doi.org/10.1093/bioinformatics/btab471 (2021).
Sun, S., Zhu, J. & Zhou, X. Statistical analysis of spatial expression patterns for spatially resolved transcriptomic studies. Nat. Methods 17, 193–200 (2020).
Sun, S. et al. SPARK usage for spatially variable gene detection. Xiang Zhou Lab https://xzhoulab.github.io/SPARK/ (2020).
Maynard, K. R. et al. spatialLIBD for hosting dorsolateral prefrontal cortex 10x Visium dataset. spatialLIBD http://research.libd.org/spatialLIBD (2021).
Xu, Z. et al. STOmicsDB database page of mouse embryo Stereo-seq dataset. China National GeneBank https://db.cngb.org/stomics/mosta/ (2022).
Long, B. et al. Webpage of SpaceTx. The SpaceTX Consortium https://spacetx.github.io/ (2023).
Moffitt, J. R. et al. Data from: Molecular, spatial and functional single-cell profiling of the hypothalamic preoptic region. Dryad. https://doi.org/10.5061/dryad.8t8s248 (2018).
Codeluppi, S. et al. Data and code availability. Expression data: loom file with osmFISH data. Linnarsson Lab http://linnarssonlab.org/osmFISH/availability/ (2018).
Wang, X. et al. Data from: Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Deisseroth Lab http://clarityresourcecenter.org/ (2018).
Andersson, A. et al. Spatial deconvolution of HER2-positive breast cancer delineates tumor-associated cell type interactions. Zenodo https://doi.org/10.5281/zenodo.4751624 (2021).
Allen, W. E. et al. Molecular and spatial signatures of mouse brain aging at single-cell resolution. CZ CELLxGENE https://cellxgene.cziscience.com/collections/31937775-0602-4e52-a799-b6acdd2bac2e (2022).
Wang, J. et al. scGNN is a novel graph neural network framework for single-cell RNA-Seq analyses. Nat. Commun. 12, 1882 (2021).
Kiselev, V. Y. et al. SC3: consensus clustering of single-cell RNA-seq data. Nat. Methods 14, 483–486 (2017).
Wang, B., Zhu, J. J., Pierson, E., Ramazzotti, D. & Batzoglou, S. Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning. Nat. Methods 14, 414–416 (2017).
Pedregosa, F. et al. Homogeneity score usage in scikit-learn. scikit-learn https://scikit-learn.org/stable/modules/generated/sklearn.metrics.homogeneity_score.html (2014).
Pedregosa, F. et al. Completeness score usage in scikit-learn. scikit-learn https://scikit-learn.org/stable/modules/generated/sklearn.metrics.completeness_score.html (2014).
Alexandrov, T. & Bartels, A. Testing for presence of known and unknown molecules in imaging mass spectrometry. Bioinformatics 29, 2335–2342 (2013).
Guo, L. et al. Data filtering and its prioritization in pipelines for spatial segmentation of mass spectrometry imaging. Anal. Chem. 93, 4788–4793 (2021).
Rousseeuw, P. J. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987).
Miller, B. F., Bambah-Mukku, D., Dulac, C., Zhuang, X. & Fan, J. Characterizing spatial gene expression heterogeneity in spatially resolved single-cell transcriptomic data with nonuniform cellular densities. Genome Res. 31, 1843–1855 (2021).
Ren, H. et al. SpaceFlow. GitHub https://github.com/hongleir/SpaceFlow (2022).
Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).
Harris, C. R. et al. Array programming with NumPy. Nature 585, 357–362 (2020).
Palla, G. et al. Squidpy: a scalable framework for spatial omics analysis. Nat. Methods https://doi.org/10.1038/s41592-021-01358-2 (2022).
McKinney, W. Data structures for statistical computing in Python. In Proc. 9th Python in Science Conference Vol. 445 (eds van der Walt, S. & Millman, J.) 51–56 (2010).
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Hunter, J. D. Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007).
Waskom, M. L. Seaborn: statistical data visualization. J. Open Source Softw. 6, 3021 (2021).
Davis, M., Sick, J. & Eschbacher, A. palettable: color palettes for Python. Astrophysics Source Code Library ascl: 2202.2005 (2022).
Acknowledgements
This study was supported by National Nature Science Foundation of China (62303119, Z.Y.), Chenguang Program of Shanghai Education Development Foundation and Shanghai Municipal Education Commission (22CGA02, Z.Y.), Shanghai Science and Technology Development Funds (23YF1403000 Z.Y.), Tencent AI Lab Rhino-Bird Focused Research Program (RBFR2023008, Z.Y.), Innovation Fund of Institute of Computing and Technology, CAS (E161080 and E161030, Yi Zhao) and Beijing Natural Science Foundation Haidian Origination and Innovation Joint Fund (L222007, Yi Zhao). This work was also supported by Shanghai Municipal Science and Technology Major Project (no. 2018SHZDZX01), ZJ Lab, and Shanghai Center for Brain Science and Brain-Inspired Technology, and 111 Project (no. B18015). The authors would like to acknowledge the Nanjing Institute of InforSuperBahn MLOps for providing the training and evaluation platform.
Author information
Authors and Affiliations
Contributions
Yi Zhao and Z.Y. conceived and designed the study. Z.Y. and Yi Zhao designed the metrics, benchmark pipeline, and collected the methods and datasets. F.Z. and Z.Y. implemented the benchmarking pipeline. Z.Y. implemented the divide and conquer strategy. Z.Y. and F.Z. analyzed the results and generated the figures. Z.Y., F.Z. and Yi Zhao wrote the manuscript. Yu Zhao, J.Y. and Y.C. helped implement the large data scalability. X.Z. and J.Y. provided tissue anatomical knowledge. S.L. helped re-implement the methods.
Corresponding authors
Ethics declarations
Competing interests
The author declares no competing interests.
Peer review
Peer review information
Nature Methods thanks Karoline Holler and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editor: Madhura Mukhopadhyay, in collaboration with the Nature Methods team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 The differences between spatial clustering and cell type clustering.
Spatial clustering and cell type clustering are different tasks, we explained their differences in their goals, features, and representative work. We also used an example from mouse motor cortex data to explain their differences.
Extended Data Fig. 2 Methods performance on various biotechnologies.
On the heatmap, the rows represent the biotechnologies, the columns represent the methods, and each value in the figure represents the NMI values.
Extended Data Fig. 3 User guidance.
Recommend the suitable methods for users according to the data at hand. Note that the method choice was based on the accuracy scores. For more specific recommendations, users should look at Fig. 4 to refer to other aspects of performance.
Extended Data Fig. 4 Performance on challenging datasets.
A: This figure records all methods IoU across small and non-continuous data, where data35-data41 are breast cancer data and data42-data54 are liver data. B: This figure records the number of successful identifications (IoU >= 0.5) for each method.
Extended Data Fig. 5 Limitations of current methods on large-scale datasets.
A large-scale MERFISH dataset was used to illustrate that current methods cannot be applied on the dataset. A: The dataset information. B: Other large-scale datasets available in the field. Each point is a dataset, x stands for the number of cells, y stands for the number of slices. The publication information is annotated beside the points. Colors indicate different spatial technologies. C: Issues of each method when applied on the dataset in A. Time issue means the running time exceeds 5 hours, and memory issue means the program report”out of memory” error. Computational resources can be found in Methods. D: The running time of BASS and STAGATE, as the function of the number of slices of the dataset in (A).
Supplementary information
Supplementary Information
Supplementary Figs. 1–51 and Notes 1–13.
Supplementary Tables 1–3
Supplementary Table 1. Data information. Supplementary Table 2. Running status of benchmarking methods. Supplementary Table 3. Parameter searching range of benchmarking methods.
Source data
Source Data Fig. 1
Raw data of bar plots in Fig. 1b.
Source Data Fig. 2
Raw data of methods benchmarking for MERFISH and Visium data in Fig. 2.
Source Data Fig. 3
Raw data of correlation matrix in Fig. 3.
Source Data Fig. 4
Raw data of overall performance comparisons in Fig. 4.
Source Data Fig. 5
Raw data of large-scale scalability in Fig. 5.
Source Data Fig. 6
Raw data of robustness evaluations in Fig. 6.
Source Data Extended Data Fig./Table 5
Raw data of running time in Extended Data Fig. 5.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yuan, Z., Zhao, F., Lin, S. et al. Benchmarking spatial clustering methods with spatially resolved transcriptomics data. Nat Methods 21, 712–722 (2024). https://doi.org/10.1038/s41592-024-02215-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41592-024-02215-8