Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Annotation of spatially resolved single-cell data with STELLAR

Abstract

Accurate cell-type annotation from spatially resolved single cells is crucial to understand functional spatial biology that is the basis of tissue organization. However, current computational methods for annotating spatially resolved single-cell data are typically based on techniques established for dissociated single-cell technologies and thus do not take spatial organization into account. Here we present STELLAR, a geometric deep learning method for cell-type discovery and identification in spatially resolved single-cell datasets. STELLAR automatically assigns cells to cell types present in the annotated reference dataset and discovers novel cell types and cell states. STELLAR transfers annotations across different dissection regions, different tissues and different donors, and learns cell representations that capture higher-order tissue structures. We successfully applied STELLAR to CODEX multiplexed fluorescent microscopy data and multiplexed RNA imaging datasets. Within the Human BioMolecular Atlas Program, STELLAR has annotated 2.6 million spatially resolved single cells with dramatic time savings.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: STELLAR is a geometric deep learning framework for annotating spatially resolved single-cell datasets.
Fig. 2: STELLAR accurately identifies cell types from the reference set and discovers novel cell types that have never been characterized in the reference set.
Fig. 3: STELLAR transfers granular cell-type labels across tissue regions and donors from HuBMAP data and identifies main structures of healthy human intestine tissue.
Fig. 4: STELLAR’s embeddings reveal higher-order tissue structures.

Similar content being viewed by others

Data availability

The CODEX datasets presented in this study can be found in the online repository Dryad at https://datadryad.org/stash/share/1OQtxew0Unh3iAdP-ELew-ctwuPTBz6Oy8uuyxqliZk. Specifically, the quantified single-cell data are provided (with cells in rows and protein expression, xy position and cell-type labels in columns). Additionally, we provide datasets used to transfer from the tonsil to BE tissue (BE_Tonsil_dryad.csv) and expert-annotated healthy human intestine (B004_training_dryad.csv), which was used to test the accuracy of STELLAR across the four regions of the colon regions of this dataset and also for training for transferring cell-type labels to unlabeled donors (B0056_unannotated_dryad.csv). MERFISH mouse cortex datasets are from Ref. 8.

Code availability

STELLAR was written in Python v.3.8 using the PyTorch library. The source code is available on Github at https://github.com/snap-stanford/stellar. The project website with links to data and code can be accessed at http://snap.stanford.edu/stellar/.

References

  1. Lewis, S. M. et al. Spatial omics and multiplexed imaging to explore cancer biology. Nat. Methods 18, 997–1012 (2021).

    Article  CAS  PubMed  Google Scholar 

  2. Bodenmiller, B. Multiplexed epitope-based tissue imaging for discovery and healthcare applications. Cell Systems 2, 225–238 (2016).

    Article  CAS  PubMed  Google Scholar 

  3. Chen, K. H., Boettiger, A. N., Moffitt, J. R., Wang, S. & Zhuang, X. Spatially resolved, highly multiplexed RNA profiling in single cells. Science 348, aaa6090 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  4. Hickey, J. W. et al. Spatial mapping of protein composition and tissue organization: a primer for multiplexed antibody-based imaging. Nat. Methods 19, 284–295 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  5. HuBMAP Consortium. The human body at cellular resolution: the NIH Human Biomolecular Atlas Program. Nature 574, 187–192 (2019).

    Article  CAS  Google Scholar 

  6. Rozenblatt-Rosen, O. et al. The Human Tumor Atlas Network: charting tumor transitions across space and time at single-cell resolution. Cell 181, 236–249 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Regev, A. et al. Science forum: the Human Cell Atlas. eLife 6, e27041 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  8. Zhang, M. et al. Spatially resolved cell atlas of the mouse primary motor cortex by MERFISH. Nature 598, 137–143 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Black, S. et al. CODEX multiplexed tissue imaging with DNA-conjugated antibodies. Nature Protocols 16, 3802–3802 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Goltsev, Y. et al. Deep profiling of mouse splenic architecture with CODEX multiplexed imaging. Cell 174, 968–981 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Teng, H., Yuan, Y. & Bar-Joseph, Z. Clustering spatial transcriptomics data. Bioinformatics 38, 997–1004 (2021).

    Article  PubMed Central  Google Scholar 

  12. Partel, G. & Wählby, C. Spage2vec: unsupervised representation of localized spatial gene expression signatures. FEBS J 288, 1859–1870 (2021).

    Article  CAS  PubMed  Google Scholar 

  13. Zhao, E. et al. Spatial transcriptomics at subspot resolution with BayesSpace. Nat. Biotech. 39, 1375–1384 (2021).

    Article  CAS  Google Scholar 

  14. Hu, J. et al. SpaGCN: integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nat. Methods 18, 1342–1351 (2021).

    Article  PubMed  Google Scholar 

  15. Zeng, Z., Li, Y., Li, Y. & Luo, Y. Statistical and machine learning methods for spatially resolved transcriptomics data analysis. Genome Biol 23, 83 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  16. Zhang, W. et al. Identification of cell types in multiplexed in situ images by combining protein expression and spatial information using CELESTA. Nat. Methods 19, 759–769 (2022).

    Article  CAS  PubMed  Google Scholar 

  17. Hickey, J. W. et al. High resolution single cell maps reveals distinct cell organization and function across different regions of the human intestine. Preprint at bioRxiv (2021).

  18. Greenbaum, S. et al. Spatio-temporal coordination at the maternal-fetal interface promotes trophoblast invasion and vascular remodeling in the first half of human pregnancy. Preprint at bioRxiv (2021).

  19. Currlin, S. et al. 3D-mapping of human lymph node and spleen reveals integrated neuronal, vascular, and ductal cell networks. Preprint at bioRxiv (2021).

  20. Neumann, E. K. et al. A multiscale atlas of the molecular and cellular architecture of the human kidney. Preprint at bioRxiv (2022).

  21. Lake, B. B. et al. An atlas of healthy and injured cell states and niches in the human kidney. Preprint at bioRxiv (2021).

  22. Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks, in Proc. International Conference on Learning Representations (2016).

  23. Hamilton, W., Ying, Z. & Leskovec, J. Inductive representation learning on large graphs. in Proc. Adv. Neural Inform. Proc. Syst. 30 (eds Guyon, I. et al.) (2017).

  24. Cao, K., Brbic, M. & Leskovec, J. Open-world semi-supervised learning, in Proc. International Conference on Learning Representations (2022).

  25. Schürch, C. M. et al. Coordinated cellular neighborhoods orchestrate antitumoral immunity at the colorectal cancer invasive front. Cell 182, 1341–1359 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Chen, T. & Guestrin, C. XGBoost: a scalable tree boosting system, in Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (eds Krishnapuram, B. et al.) (2016).

  27. Cortes, C. & Vapnik, V. Support-vector networks. Mach. Learn. 20, 273–297 (1995).

    Article  Google Scholar 

  28. Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).

    Article  Google Scholar 

  29. Freund, Y. & Schapire, R. E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 119–139 (1997).

    Article  Google Scholar 

  30. Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Kimmel, J. C. & Kelley, D. R. Semi-supervised adversarial neural networks for single-cell classification. Genome Res. 31, 1781–1793 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  32. Xu, C. et al. Probabilistic harmonization and annotation of single-cell transcriptomics data with deep generative models. Mol. Syst. Biol. 17, e9620 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  33. Hickey, J. W., Tan, Y., Nolan, G. P. & Goltsev, Y. Strategies for accurate cell type identification in CODEX multiplexed imaging data. Front. Immunol. 3317 (2021).

  34. Sautès-Fridman, C., Petitprez, F., Calderaro, J. & Fridman, W. H. Tertiary lymphoid structures in the era of cancer immunotherapy. Nat. Rev. Cancer 19, 307–325 (2019).

    Article  PubMed  Google Scholar 

  35. Blondel, V. D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008, P10008 (2008).

    Article  Google Scholar 

  36. Hollandi, R. et al. Nucleus segmentation: towards automated solutions. Trends Cell Biol. 32, 295–310 (2022).

    Article  PubMed  Google Scholar 

  37. Van Buren, K. et al. Artificial intelligence and deep learning to map immune cell types in inflamed human tissue. J. Immunol. Methods 505, 113233 (2022).

    Article  PubMed  Google Scholar 

  38. Wolf, F. A., Angerer, P. & Theis, F. J. Scanpy: large-scale single-cell gene expression data analysis. Genome Biol 19, 15 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  39. Liu, B. et al. Negative margin matters: understanding margin in few-shot classification, in Proc. European Conference on Computer Vision, 438-455 (eds Vedaldi, A. et al) (2020).

  40. Chiang, W.-L. et al. Cluster-GCN: an efficient algorithm for training deep and large graph convolutional networks, in Proc. ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 257–266 (eds Teredesai, A. et al.) (2019).

Download references

Acknowledgements

This work was supported by the US National Institutes of Health (grant nos. 2U19AI057229-16, 5P01HL10879707, 5R01GM10983604, 5R33CA18365403, 5U01AI101984-07, 5UH2AR06767604, 5R01CA19665703, 5U54CA20997103, 5F99CA212231-02, 1F32CA233203-01, 5U01AI140498-02, 1U54HG010426-01, 5U19AI100627-07, 1R01HL120724-01A1, R33CA183692, R01HL128173-04, 5P01AI131374-02, 5UG3DK114937-02, 1U19AI135976-01, IDIQ17X149, 1U2CCA233238-01 and 1U2CCA233195-01); Cancer Research UK (grant no. C27165/A29073); and the Parker Institute for Cancer Immunotherapy. J.W.H. was supported by an NIH T32 Fellowship (grant no. T32CA196585) and an American Cancer Society: Roaring Fork Valley Postdoctoral Fellowship (grant no. PF-20-032-01-CSM). We also gratefully acknowledge the support of DARPA under grant nos. HR00112190039 (TAMI), N660011924033 (MCS); ARO under grant nos. W911NF-16-1-0342 (MURI), W911NF-16-1-0171 (DURIP); NSF under grant nos. OAC-1835598 (CINES), OAC-1934578 (HDR), CCF-1918940 (Expeditions), IIS-2030477 (RAPID), NIH under grant no. R56LM013365; Stanford Data Science Initiative, Wu Tsai Neurosciences Institute, Amazon, JPMorgan Chase, Docomo, Hitachi, Juniper Networks, Intel, KDDI and Toshiba.

Author information

Authors and Affiliations

Authors

Contributions

M.B., K.C., J.W.H. and J.L. conceived the research. M.B., K.C., J.W.H. and Y.T. performed research and analyzed results. M.B., K.C. and J.L. contributed new analytical tools and created the algorithmic framework. J.W.H., M.P.S. and G.P.N. generated and analyzed the data. J.L., G.P.N. and M.P.S. supervised the research. All authors participated in interpretation and wrote the manuscript.

Corresponding authors

Correspondence to Garry P. Nolan or Jure Leskovec.

Ethics declarations

Competing interests

M.P.S. is cofounder and advisory board member of Personalis, Qbio, January AI, Mirvie, Filtricine, Fodsel, Protos. RTHM, Marble Therapeutics and Crosshair Therapeutics. G.P.N. has equity in and is a scientific advisory board member of Akoya Biosciences, Inc. The other authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks Ellis Patrick, Darren Tyson and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Rita Strack, in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 STELLAR overview.

STELLAR is a unique method in its ability to simultaneously recognize cell types seen in the reference set and discover novel cell types that have never been characterized in the reference set. This is made possible by an objective function that consists of two main components (Methods). First, STELLAR learns to gradually separate cell types from the reference set by controlling intra-class variance to allow the model to simultaneously learn to discover novel cell types. Simultaneously, STELLAR discovers novel classes by generating auxiliary labels (pseudo-labels) in the unannotated graph that are used to guide the training. The auxiliary labels are generated based on the nearest neighbors of each cell in the embedding space.

Extended Data Fig. 2 CODEX image of reference dataset from human tonsil.

Ground-truth labels of the tonsil CODEX multiplexed imaging dataset. Colors denote different cell types.

Extended Data Fig. 3 Cell-type distributions on tonsil and BE datasets.

Cell-type distributions of ground-truth labels on (a) tonsil reference dataset and (b) Barrett’s esophagus dataset. PDPN stands for Podoplanin (PDPN) positive stromal cells.

Extended Data Fig. 4 Neighborhoods found in tonsil and Barrett’s esophagus (BE) dataset.

(a) Neighborhood heatmap showing the neighborhoods found across both tissues and cell types enriched compared to tissue averages. (b) Neighborhood composition between BE and tonsil tissues. (c) Neighborhood types mapped back to tissue coordinates.

Extended Data Fig. 5 Comparison of STELLAR to baseline methods on the Barrett’s esophagus (BE) dataset.

(a) Accuracy of STELLAR and scANVI on the BE dataset. Performance was evaluated as a mean score across n=5 runs of each method. Error bars are from standard deviation. scANVI stands for the setting evaluated in the same manner as STELLAR in which we train the model on tonsil dataset and evaluate on BE dataset. scANVI_leaky stands for the approach in which we use fraction of labels from BE dataset as the training data and use the rest of the BE dataset as the test set. Although the setting in which scANVI_leaky is evaluated does not present a fair comparison to STELLAR and other baselines, it indicates that the performance of drop of scANVI is caused by differences between tonsil and BE datasets. (b-d) Performance of STELLAR and alternative baselines on the BE dataset evaluated as (b) mean macro F1-score, (c) macro precision score, and (d) macro recall score across n=5 runs of each method. Error bars are from standard deviation. XGB stands for XGBoost, SVM for Support Vector Machine, RF for Random Forest, ADA for ADABoost, and Seurat for Seurat V4.

Extended Data Fig. 6 Robustness of STELLAR evaluated on the Barrett’s esophagus (BE) dataset.

(a) Performance of STELLAR using different normalization strategies. ‘Unnorm’ stands for raw (unnormalized) data. Performance was evaluated as a mean accuracy score across n=5 runs of each normalization strategy. Error bars are from standard deviation. (b) Performance of STELLAR when misannotating proportion of randomly selected cells. In each run, cells were randomly selected and labels different than ground truth annotations were randomly assigned to cells in the annotated reference tonsil dataset. Performance was evaluated as an accuracy score across n=5 runs. Individual data points are shown. (c) Performance of STELLAR when removing different number of marker genes. In each run, different set of randomly selected marker genes was withheld from the reference tonsil dataset and BE datasets. Performance was evaluated as a mean accuracy score across n=5 runs. Error bars are from standard deviation.

Extended Data Fig. 7 Performance of STELLAR on the MERFISH dataset from mouse cortex.

We applied STELLAR to a large-scale mouse primary motor cortex MERFISH dataset consisting of 23 granular cell types from two mice [8]. (a) Annotation accuracy of STELLAR on the MERFISH mouse cortex dataset with different numbers of withheld cell types. Position of scatter plot points is computed as a mean accuracy score across n=5 runs. Error bars are from standard deviation. We randomly removed a number of cell types from the reference set and evaluated STELLAR’s performance by gradually increasing the number of removed cell types. We measured accuracy separately on classes seen in the reference set and classes withheld from the reference set. Performance is evaluated on the reference cell types, novel cell types withheld from the reference set during training, and jointly on all cell types. (b, c) UMAP visualization of MERFISH mouse cortex dataset from mouse used as the test set. Cells are colored according to (b) ground-truth annotations, and (c) STELLAR’s predictions without any withheld cell types.

Extended Data Fig. 8 STELLAR predictions on the dataset from healthy intestine.

CODEX-imaged regions with cell types colored by prediction from STELLAR using data from the healthy intestine of a different donor as the reference set. Data from both small intestine and colon are shown. Colors denote different cell types. DC stands for dendritic cell, ICC stands for interstitial cells of Cajal, TA stands for transit amplifying cell.

Extended Data Fig. 9 Multicellular structures discovered bt STELLAR on CODEX healthy intestine data.

Characterization of multicellular structures by clustering the embedding space from STELLAR on CODEX healthy intestine data. (a) Heatmap of average cell-type composition in clustered embeddings. (b) Representative tissue image colored by embedding structure. IEL stands for intraepithelial lymphocytes.

Extended Data Fig. 10 Multicellular structures discovered bt STELLAR on MERFISH mouse cortex data.

Clusters in STELLAR’s embedding space identify multicellular structures in tissues in MERFISH data from mouse cortex. (a) Heatmap of average cell-type composition in STELLAR clustered embeddings. (b) Representative tissue image colored by overall structure. L, lateral; OPC, oligodendrocyte precursor cell; PVM, perivascular macrophage; SMC, smooth muscle cell; VLMC, vascular leptomeningeal cell.

Supplementary information

Supplementary Information

Supplementary Notes 1–4 and Figs. 1–5.

Reporting Summary

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Brbić, M., Cao, K., Hickey, J.W. et al. Annotation of spatially resolved single-cell data with STELLAR. Nat Methods 19, 1411–1418 (2022). https://doi.org/10.1038/s41592-022-01651-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-022-01651-8

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research