Annotation of spatially resolved single-cell data with STELLAR

Brbić, Maria; Cao, Kaidi; Hickey, John W.; Tan, Yuqi; Snyder, Michael P.; Nolan, Garry P.; Leskovec, Jure

doi:10.1038/s41592-022-01651-8

Article
Published: 24 October 2022

Annotation of spatially resolved single-cell data with STELLAR

Nature Methods volume 19, pages 1411–1418 (2022)Cite this article

15k Accesses
21 Citations
94 Altmetric
Metrics details

Subjects

Abstract

Accurate cell-type annotation from spatially resolved single cells is crucial to understand functional spatial biology that is the basis of tissue organization. However, current computational methods for annotating spatially resolved single-cell data are typically based on techniques established for dissociated single-cell technologies and thus do not take spatial organization into account. Here we present STELLAR, a geometric deep learning method for cell-type discovery and identification in spatially resolved single-cell datasets. STELLAR automatically assigns cells to cell types present in the annotated reference dataset and discovers novel cell types and cell states. STELLAR transfers annotations across different dissection regions, different tissues and different donors, and learns cell representations that capture higher-order tissue structures. We successfully applied STELLAR to CODEX multiplexed fluorescent microscopy data and multiplexed RNA imaging datasets. Within the Human BioMolecular Atlas Program, STELLAR has annotated 2.6 million spatially resolved single cells with dramatic time savings.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: STELLAR is a geometric deep learning framework for annotating spatially resolved single-cell datasets.**

**Fig. 2: STELLAR accurately identifies cell types from the reference set and discovers novel cell types that have never been characterized in the reference set.**

**Fig. 3: STELLAR transfers granular cell-type labels across tissue regions and donors from HuBMAP data and identifies main structures of healthy human intestine tissue.**

**Fig. 4: STELLAR’s embeddings reveal higher-order tissue structures.**

An end-to-end workflow for multiplexed image processing and analysis

Article 10 October 2023

Prediction of single-cell RNA expression profiles in live cells by Raman microscopy with Raman2RNA

Article 10 January 2024

TACCO unifies annotation transfer and decomposition of cell identities for single-cell and spatial omics

Article Open access 16 February 2023

Data availability

The CODEX datasets presented in this study can be found in the online repository Dryad at https://datadryad.org/stash/share/1OQtxew0Unh3iAdP-ELew-ctwuPTBz6Oy8uuyxqliZk. Specifically, the quantified single-cell data are provided (with cells in rows and protein expression, xy position and cell-type labels in columns). Additionally, we provide datasets used to transfer from the tonsil to BE tissue (BE_Tonsil_dryad.csv) and expert-annotated healthy human intestine (B004_training_dryad.csv), which was used to test the accuracy of STELLAR across the four regions of the colon regions of this dataset and also for training for transferring cell-type labels to unlabeled donors (B0056_unannotated_dryad.csv). MERFISH mouse cortex datasets are from Ref. ⁸.

Code availability

STELLAR was written in Python v.3.8 using the PyTorch library. The source code is available on Github at https://github.com/snap-stanford/stellar. The project website with links to data and code can be accessed at http://snap.stanford.edu/stellar/.

References

Lewis, S. M. et al. Spatial omics and multiplexed imaging to explore cancer biology. Nat. Methods 18, 997–1012 (2021).
Article CAS PubMed Google Scholar
Bodenmiller, B. Multiplexed epitope-based tissue imaging for discovery and healthcare applications. Cell Systems 2, 225–238 (2016).
Article CAS PubMed Google Scholar
Chen, K. H., Boettiger, A. N., Moffitt, J. R., Wang, S. & Zhuang, X. Spatially resolved, highly multiplexed RNA profiling in single cells. Science 348, aaa6090 (2015).
Article PubMed PubMed Central Google Scholar
Hickey, J. W. et al. Spatial mapping of protein composition and tissue organization: a primer for multiplexed antibody-based imaging. Nat. Methods 19, 284–295 (2021).
Article PubMed PubMed Central Google Scholar
HuBMAP Consortium. The human body at cellular resolution: the NIH Human Biomolecular Atlas Program. Nature 574, 187–192 (2019).
Article CAS Google Scholar
Rozenblatt-Rosen, O. et al. The Human Tumor Atlas Network: charting tumor transitions across space and time at single-cell resolution. Cell 181, 236–249 (2020).
Article CAS PubMed PubMed Central Google Scholar
Regev, A. et al. Science forum: the Human Cell Atlas. eLife 6, e27041 (2017).
Article PubMed PubMed Central Google Scholar
Zhang, M. et al. Spatially resolved cell atlas of the mouse primary motor cortex by MERFISH. Nature 598, 137–143 (2021).
Article CAS PubMed PubMed Central Google Scholar
Black, S. et al. CODEX multiplexed tissue imaging with DNA-conjugated antibodies. Nature Protocols 16, 3802–3802 (2021).
Article CAS PubMed PubMed Central Google Scholar
Goltsev, Y. et al. Deep profiling of mouse splenic architecture with CODEX multiplexed imaging. Cell 174, 968–981 (2018).
Article CAS PubMed PubMed Central Google Scholar
Teng, H., Yuan, Y. & Bar-Joseph, Z. Clustering spatial transcriptomics data. Bioinformatics 38, 997–1004 (2021).
Article PubMed Central Google Scholar
Partel, G. & Wählby, C. Spage2vec: unsupervised representation of localized spatial gene expression signatures. FEBS J 288, 1859–1870 (2021).
Article CAS PubMed Google Scholar
Zhao, E. et al. Spatial transcriptomics at subspot resolution with BayesSpace. Nat. Biotech. 39, 1375–1384 (2021).
Article CAS Google Scholar
Hu, J. et al. SpaGCN: integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nat. Methods 18, 1342–1351 (2021).
Article PubMed Google Scholar
Zeng, Z., Li, Y., Li, Y. & Luo, Y. Statistical and machine learning methods for spatially resolved transcriptomics data analysis. Genome Biol 23, 83 (2022).
Article PubMed PubMed Central Google Scholar
Zhang, W. et al. Identification of cell types in multiplexed in situ images by combining protein expression and spatial information using CELESTA. Nat. Methods 19, 759–769 (2022).
Article CAS PubMed Google Scholar
Hickey, J. W. et al. High resolution single cell maps reveals distinct cell organization and function across different regions of the human intestine. Preprint at bioRxiv (2021).
Greenbaum, S. et al. Spatio-temporal coordination at the maternal-fetal interface promotes trophoblast invasion and vascular remodeling in the first half of human pregnancy. Preprint at bioRxiv (2021).
Currlin, S. et al. 3D-mapping of human lymph node and spleen reveals integrated neuronal, vascular, and ductal cell networks. Preprint at bioRxiv (2021).
Neumann, E. K. et al. A multiscale atlas of the molecular and cellular architecture of the human kidney. Preprint at bioRxiv (2022).
Lake, B. B. et al. An atlas of healthy and injured cell states and niches in the human kidney. Preprint at bioRxiv (2021).
Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks, in Proc. International Conference on Learning Representations (2016).
Hamilton, W., Ying, Z. & Leskovec, J. Inductive representation learning on large graphs. in Proc. Adv. Neural Inform. Proc. Syst. 30 (eds Guyon, I. et al.) (2017).
Cao, K., Brbic, M. & Leskovec, J. Open-world semi-supervised learning, in Proc. International Conference on Learning Representations (2022).
Schürch, C. M. et al. Coordinated cellular neighborhoods orchestrate antitumoral immunity at the colorectal cancer invasive front. Cell 182, 1341–1359 (2020).
Article PubMed PubMed Central Google Scholar
Chen, T. & Guestrin, C. XGBoost: a scalable tree boosting system, in Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (eds Krishnapuram, B. et al.) (2016).
Cortes, C. & Vapnik, V. Support-vector networks. Mach. Learn. 20, 273–297 (1995).
Article Google Scholar
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Article Google Scholar
Freund, Y. & Schapire, R. E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 119–139 (1997).
Article Google Scholar
Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587 (2021).
Article CAS PubMed PubMed Central Google Scholar
Kimmel, J. C. & Kelley, D. R. Semi-supervised adversarial neural networks for single-cell classification. Genome Res. 31, 1781–1793 (2021).
Article PubMed PubMed Central Google Scholar
Xu, C. et al. Probabilistic harmonization and annotation of single-cell transcriptomics data with deep generative models. Mol. Syst. Biol. 17, e9620 (2021).
Article PubMed PubMed Central Google Scholar
Hickey, J. W., Tan, Y., Nolan, G. P. & Goltsev, Y. Strategies for accurate cell type identification in CODEX multiplexed imaging data. Front. Immunol. 3317 (2021).
Sautès-Fridman, C., Petitprez, F., Calderaro, J. & Fridman, W. H. Tertiary lymphoid structures in the era of cancer immunotherapy. Nat. Rev. Cancer 19, 307–325 (2019).
Article PubMed Google Scholar
Blondel, V. D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008, P10008 (2008).
Article Google Scholar
Hollandi, R. et al. Nucleus segmentation: towards automated solutions. Trends Cell Biol. 32, 295–310 (2022).
Article PubMed Google Scholar
Van Buren, K. et al. Artificial intelligence and deep learning to map immune cell types in inflamed human tissue. J. Immunol. Methods 505, 113233 (2022).
Article PubMed Google Scholar
Wolf, F. A., Angerer, P. & Theis, F. J. Scanpy: large-scale single-cell gene expression data analysis. Genome Biol 19, 15 (2018).
Article PubMed PubMed Central Google Scholar
Liu, B. et al. Negative margin matters: understanding margin in few-shot classification, in Proc. European Conference on Computer Vision, 438-455 (eds Vedaldi, A. et al) (2020).
Chiang, W.-L. et al. Cluster-GCN: an efficient algorithm for training deep and large graph convolutional networks, in Proc. ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 257–266 (eds Teredesai, A. et al.) (2019).

Download references

Acknowledgements

This work was supported by the US National Institutes of Health (grant nos. 2U19AI057229-16, 5P01HL10879707, 5R01GM10983604, 5R33CA18365403, 5U01AI101984-07, 5UH2AR06767604, 5R01CA19665703, 5U54CA20997103, 5F99CA212231-02, 1F32CA233203-01, 5U01AI140498-02, 1U54HG010426-01, 5U19AI100627-07, 1R01HL120724-01A1, R33CA183692, R01HL128173-04, 5P01AI131374-02, 5UG3DK114937-02, 1U19AI135976-01, IDIQ17X149, 1U2CCA233238-01 and 1U2CCA233195-01); Cancer Research UK (grant no. C27165/A29073); and the Parker Institute for Cancer Immunotherapy. J.W.H. was supported by an NIH T32 Fellowship (grant no. T32CA196585) and an American Cancer Society: Roaring Fork Valley Postdoctoral Fellowship (grant no. PF-20-032-01-CSM). We also gratefully acknowledge the support of DARPA under grant nos. HR00112190039 (TAMI), N660011924033 (MCS); ARO under grant nos. W911NF-16-1-0342 (MURI), W911NF-16-1-0171 (DURIP); NSF under grant nos. OAC-1835598 (CINES), OAC-1934578 (HDR), CCF-1918940 (Expeditions), IIS-2030477 (RAPID), NIH under grant no. R56LM013365; Stanford Data Science Initiative, Wu Tsai Neurosciences Institute, Amazon, JPMorgan Chase, Docomo, Hitachi, Juniper Networks, Intel, KDDI and Toshiba.

Author information

These authors contributed equally: Maria Brbić, Kaidi Cao, John W. Hickey.

Authors and Affiliations

Department of Computer Science, Stanford University, Stanford, CA, USA
Maria Brbić, Kaidi Cao & Jure Leskovec
School of Computer and Communication Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
Maria Brbić
Baxter Laboratories Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford University, Stanford, CA, USA
John W. Hickey, Yuqi Tan & Garry P. Nolan
Department of Genetics, Stanford University School of Medicine, Stanford University, Stanford, CA, USA
Michael P. Snyder
Department of Pathology, Stanford University School of Medicine, Stanford University, Stanford, CA, USA
Garry P. Nolan

Authors

Maria Brbić
View author publications
You can also search for this author in PubMed Google Scholar
Kaidi Cao
View author publications
You can also search for this author in PubMed Google Scholar
John W. Hickey
View author publications
You can also search for this author in PubMed Google Scholar
Yuqi Tan
View author publications
You can also search for this author in PubMed Google Scholar
Michael P. Snyder
View author publications
You can also search for this author in PubMed Google Scholar
Garry P. Nolan
View author publications
You can also search for this author in PubMed Google Scholar
Jure Leskovec
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.B., K.C., J.W.H. and J.L. conceived the research. M.B., K.C., J.W.H. and Y.T. performed research and analyzed results. M.B., K.C. and J.L. contributed new analytical tools and created the algorithmic framework. J.W.H., M.P.S. and G.P.N. generated and analyzed the data. J.L., G.P.N. and M.P.S. supervised the research. All authors participated in interpretation and wrote the manuscript.

Corresponding authors

Correspondence to Garry P. Nolan or Jure Leskovec.

Ethics declarations

Competing interests

M.P.S. is cofounder and advisory board member of Personalis, Qbio, January AI, Mirvie, Filtricine, Fodsel, Protos. RTHM, Marble Therapeutics and Crosshair Therapeutics. G.P.N. has equity in and is a scientific advisory board member of Akoya Biosciences, Inc. The other authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks Ellis Patrick, Darren Tyson and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Rita Strack, in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 STELLAR overview.

STELLAR is a unique method in its ability to simultaneously recognize cell types seen in the reference set and discover novel cell types that have never been characterized in the reference set. This is made possible by an objective function that consists of two main components (Methods). First, STELLAR learns to gradually separate cell types from the reference set by controlling intra-class variance to allow the model to simultaneously learn to discover novel cell types. Simultaneously, STELLAR discovers novel classes by generating auxiliary labels (pseudo-labels) in the unannotated graph that are used to guide the training. The auxiliary labels are generated based on the nearest neighbors of each cell in the embedding space.

Extended Data Fig. 2 CODEX image of reference dataset from human tonsil.

Ground-truth labels of the tonsil CODEX multiplexed imaging dataset. Colors denote different cell types.

Extended Data Fig. 3 Cell-type distributions on tonsil and BE datasets.

Cell-type distributions of ground-truth labels on (a) tonsil reference dataset and (b) Barrett’s esophagus dataset. PDPN stands for Podoplanin (PDPN) positive stromal cells.

Extended Data Fig. 4 Neighborhoods found in tonsil and Barrett’s esophagus (BE) dataset.

(a) Neighborhood heatmap showing the neighborhoods found across both tissues and cell types enriched compared to tissue averages. (b) Neighborhood composition between BE and tonsil tissues. (c) Neighborhood types mapped back to tissue coordinates.

Extended Data Fig. 5 Comparison of STELLAR to baseline methods on the Barrett’s esophagus (BE) dataset.

(a) Accuracy of STELLAR and scANVI on the BE dataset. Performance was evaluated as a mean score across n=5 runs of each method. Error bars are from standard deviation. scANVI stands for the setting evaluated in the same manner as STELLAR in which we train the model on tonsil dataset and evaluate on BE dataset. scANVI_leaky stands for the approach in which we use fraction of labels from BE dataset as the training data and use the rest of the BE dataset as the test set. Although the setting in which scANVI_leaky is evaluated does not present a fair comparison to STELLAR and other baselines, it indicates that the performance of drop of scANVI is caused by differences between tonsil and BE datasets. (b-d) Performance of STELLAR and alternative baselines on the BE dataset evaluated as (b) mean macro F1-score, (c) macro precision score, and (d) macro recall score across n=5 runs of each method. Error bars are from standard deviation. XGB stands for XGBoost, SVM for Support Vector Machine, RF for Random Forest, ADA for ADABoost, and Seurat for Seurat V4.

Extended Data Fig. 6 Robustness of STELLAR evaluated on the Barrett’s esophagus (BE) dataset.

(a) Performance of STELLAR using different normalization strategies. ‘Unnorm’ stands for raw (unnormalized) data. Performance was evaluated as a mean accuracy score across n=5 runs of each normalization strategy. Error bars are from standard deviation. (b) Performance of STELLAR when misannotating proportion of randomly selected cells. In each run, cells were randomly selected and labels different than ground truth annotations were randomly assigned to cells in the annotated reference tonsil dataset. Performance was evaluated as an accuracy score across n=5 runs. Individual data points are shown. (c) Performance of STELLAR when removing different number of marker genes. In each run, different set of randomly selected marker genes was withheld from the reference tonsil dataset and BE datasets. Performance was evaluated as a mean accuracy score across n=5 runs. Error bars are from standard deviation.

Extended Data Fig. 7 Performance of STELLAR on the MERFISH dataset from mouse cortex.

We applied STELLAR to a large-scale mouse primary motor cortex MERFISH dataset consisting of 23 granular cell types from two mice [8]. (a) Annotation accuracy of STELLAR on the MERFISH mouse cortex dataset with different numbers of withheld cell types. Position of scatter plot points is computed as a mean accuracy score across n=5 runs. Error bars are from standard deviation. We randomly removed a number of cell types from the reference set and evaluated STELLAR’s performance by gradually increasing the number of removed cell types. We measured accuracy separately on classes seen in the reference set and classes withheld from the reference set. Performance is evaluated on the reference cell types, novel cell types withheld from the reference set during training, and jointly on all cell types. (b, c) UMAP visualization of MERFISH mouse cortex dataset from mouse used as the test set. Cells are colored according to (b) ground-truth annotations, and (c) STELLAR’s predictions without any withheld cell types.

Extended Data Fig. 8 STELLAR predictions on the dataset from healthy intestine.

CODEX-imaged regions with cell types colored by prediction from STELLAR using data from the healthy intestine of a different donor as the reference set. Data from both small intestine and colon are shown. Colors denote different cell types. DC stands for dendritic cell, ICC stands for interstitial cells of Cajal, TA stands for transit amplifying cell.

Extended Data Fig. 9 Multicellular structures discovered bt STELLAR on CODEX healthy intestine data.

Characterization of multicellular structures by clustering the embedding space from STELLAR on CODEX healthy intestine data. (a) Heatmap of average cell-type composition in clustered embeddings. (b) Representative tissue image colored by embedding structure. IEL stands for intraepithelial lymphocytes.

Extended Data Fig. 10 Multicellular structures discovered bt STELLAR on MERFISH mouse cortex data.

Clusters in STELLAR’s embedding space identify multicellular structures in tissues in MERFISH data from mouse cortex. (a) Heatmap of average cell-type composition in STELLAR clustered embeddings. (b) Representative tissue image colored by overall structure. L, lateral; OPC, oligodendrocyte precursor cell; PVM, perivascular macrophage; SMC, smooth muscle cell; VLMC, vascular leptomeningeal cell.

Supplementary information

Supplementary Information

Supplementary Notes 1–4 and Figs. 1–5.

Reporting Summary

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Brbić, M., Cao, K., Hickey, J.W. et al. Annotation of spatially resolved single-cell data with STELLAR. Nat Methods 19, 1411–1418 (2022). https://doi.org/10.1038/s41592-022-01651-8

Download citation

Received: 24 November 2021
Accepted: 14 September 2022
Published: 24 October 2022
Issue Date: November 2022
DOI: https://doi.org/10.1038/s41592-022-01651-8

This article is cited by

Spatial insights into immunotherapy response in non-small cell lung cancer (NSCLC) by multiplexed tissue imaging
- James Monkman
- Afshin Moradi
- Arutha Kulasinghe
Journal of Translational Medicine (2024)
Multiplex protein imaging in tumour biology
- Natalie de Souza
- Shan Zhao
- Bernd Bodenmiller
Nature Reviews Cancer (2024)
MAPS: pathologist-level cell type annotation from tissue images through machine learning
- Muhammad Shaban
- Yunhao Bai
- Faisal Mahmood
Nature Communications (2024)
Mapping cell-to-tissue graphs across human placenta histology whole slide images using deep learning with HAPPY
- Claudia Vanea
- Jelisaveta Džigurski
- Christoffer Nellåker
Nature Communications (2024)
Representing and extracting knowledge from single-cell data
- Ionut Sebastian Mihai
- Sarang Chafle
- Johan Henriksson
Biophysical Reviews (2024)