Abstract
Spatial transcriptomics enables the simultaneous measurement of morphological features and transcriptional profiles of the same cells or regions in tissues. Here we present multi-modal structured embedding (MUSE), an approach to characterize cells and tissue regions by integrating morphological and spatially resolved transcriptional data. We demonstrate that MUSE can discover tissue subpopulations missed by either modality as well as compensate for modality-specific noise. We apply MUSE to diverse datasets containing spatial transcriptomics (seqFISH+, STARmap or Visium) and imaging (hematoxylin and eosin or fluorescence microscopy) modalities. MUSE identified biologically meaningful tissue subpopulations and stereotyped spatial patterning in healthy brain cortex and intestinal tissues. In diseased tissues, MUSE revealed gene biomarkers for proximity to tumor region and heterogeneity of amyloid precursor protein processing across Alzheimer brain regions. MUSE enables the integration of multi-modal data to provide insights into the states, functions and organization of cells in complex biological tissues.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Spatial omics technologies at multimodal and single cell/subcellular level
Genome Biology Open Access 13 December 2022
-
Graph-based autoencoder integrates spatial transcriptomics with chromatin images and identifies joint biomarkers for Alzheimer’s disease
Nature Communications Open Access 03 December 2022
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 per month
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout






Data availability
seqFISH+ mouse cortex dataset: Transcript data were downloaded from the GitHub page of the seqFISH+ project (https://github.com/CaiGroup/seqFISH-PLUS) on 1 August 2019. Nissl and DAPI stained images were provided by the authors of the seqFISH+ paper.
STARmap mouse cortex dataset: Raw data were downloaded from the project page (http://clarityresourcecenter.org/) on 2 July 2019. Transcript profiles and cell segmentation masks were extracted from data using the Python pipeline provided by the authors at https://github.com/weallen/STARmap.
PDAC dataset: Both spatial transcriptomics (including gene expressions and H&E images) and scRNA-seq datasets were downloaded from the Gene Expression Omnibus (GEO) database with accession number GSE111672.
Intestine dataset: 10x Visium spatial transcriptomics were downloaded from the GEO database with accession number GSE158328.
AD dataset: Raw and normalized count matrix of the spatial transcriptomics were downloaded from the GEO database of the project (accession number GSE152506). Immunofluorescence images (Abeta, GFAP, NeuN and DAPI staining) that correspond to spatial transcriptomics data were downloaded from the ‘synapse.org’ page of the project (https://www.synapse.org/#!Synapse:syn22153884/wiki/603937) on 31 October 2020.
Code availability
Simulated tool for multi-modality data generation: Simulation code is available from GitHub (https://github.com/AltschulerWu-Lab/MUSE).
MUSE: MUSE is provided as a Python package under MIT license and can be installed through ‘pip install muse_sc’. Source code and demonstration code are available on GitHub (https://github.com/AltschulerWu-Lab/MUSE).
Change history
03 May 2022
A Correction to this paper has been published: https://doi.org/10.1038/s41587-022-01340-z
References
Perlman, Z. E. et al. Multidimensional drug profiling by automated microscopy. Science 306, 1194–1198 (2004).
Loo, L.-H., Wu, L. F. & Altschuler, S. J. Image-based multivariate profiling of drug responses from single cells. Nat. Methods 4, 445–453 (2007).
Feldman, D. et al. Optical pooled screens in human cells. Cell 179, 787–799 (2019).
Klein, A. M. et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161, 1187–1201 (2015).
Macosko, E. Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015).
Shalek, A. K. et al. Single-cell RNA-seq reveals dynamic paracrine control of cellular variation. Nature 510, 363–369 (2014).
Rizvi, A. H. et al. Single-cell topological RNA-seq analysis reveals insights into cellular differentiation and development. Nat. Biotechnol. 35, 551–560 (2017).
Gojo, J. et al. Single-cell RNA-seq reveals cellular hierarchies and impaired developmental trajectories in pediatric ependymoma. Cancer Cell 38, 44–59 (2020).
Ståhl, P. L. et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science 353, 78–82 (2016).
Vickovic, S. et al. High-definition spatial transcriptomics for in situ tissue profiling. Nat. Methods 16, 987–990 (2019).
Shah, S. et al. Dynamics and spatial genomics of the nascent transcriptome by intron seqFISH. Cell 174, 363–376 (2018).
Eng, C.-H. L. et al. Transcriptome-scale super-resolved imaging in tissues by RNA seqFISH. Nature 568, 235–239 (2019).
Chen, K. H., Boettiger, A. N., Moffitt, J. R., Wang, S. & Zhuang, X. Spatially resolved, highly multiplexed RNA profiling in single cells. Science 348, aaa6090 (2015).
Moffitt, J. R. et al. High-throughput single-cell gene-expression profiling with multiplexed error-robust fluorescence in situ hybridization. Proc. Natl Acad. Sci. USA 113, 11046–11051 (2016).
Wang, X. et al. Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science 361, eaat5691 (2018).
Thompson, B. Canonical Correlation Analysis: Uses and Interpretation (Sage, 1984).
Argelaguet, R. et al. Multi-Omics Factor Analysis—a framework for unsupervised integration of multi-omics data sets. Mol. Syst. Biol. 14, e8124 (2018).
Argelaguet, R. et al. MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data. Genome Biol. 21, 111 (2020).
Hinton, G.E. & Zemel, R.S. In: Advances in Neural Information Processing Systems 3–10 (MIT Press, 1994).
Baldi, P. In: Proceedings of ICML Workshop on Unsupervised and Transfer Learning 37–49 (MLR Press, 2012).
Chechik, G., Sharma, V., Shalit, U. & Bengio, S. Large scale online learning of image similarity through ranking. J.Mach. Learn. Res. 11, 1109–1135 (2010).
Hoffer, E. & Ailon, N. In: International Workshop on Similarity-Based Pattern Recognition 84–92 (Springer, 2015).
Levine, J. H. et al. Data-driven phenotypic dissection of AML reveals progenitor-like cells that correlate with prognosis. Cell 162, 184–197 (2015).
Hubert, L. & Arabie, P. Comparing partitions. J. Classif. 2, 193–218 (1985).
Wang, B., Zhu, J., Pierson, E., Ramazzotti, D. & Batzoglou, S. Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning. Nat. Methods 14, 414–416 (2017).
Deng, Y., Bao, F., Dai, Q., Wu, L. F. & Altschuler, S. J. Scalable analysis of cell-type composition from single-cell transcriptomics using deep recurrent learning. Nat. Methods 16, 311–314 (2019).
Qiu, P. Embracing the dropouts in single-cell RNA-seq analysis. Nat. Commun. 11, 1169 (2020).
Yuan, G.-C. et al. Challenges and emerging directions in single-cell analysis. Genome Biol. 18, 84 (2017).
Satija, R., Farrell, J. A., Gennert, D., Schier, A. F. & Regev, A. Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33, 495–502 (2015).
Zhu, Q., Shah, S., Dries, R., Cai, L. & Yuan, G.-C. Identification of spatially associated subpopulations by combining scRNAseq and sequential fluorescence in situ hybridization data. Nat. Biotechnol. https://doi.org/10.1038/nbt.4260 (2018).
Belgard, T. G. et al. A transcriptomic atlas of mouse neocortical layers. Neuron 71, 605–616 (2011).
Tasic, B. et al. Adult mouse cortical cell taxonomy revealed by single cell transcriptomics. Nat. Neurosci. 19, 335–346 (2016).
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. & Wojna, Z. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2818–2826 (IEEE, 2016).
Pierson, E. & Yau, C. ZIFA: dimensionality reduction for zero-inflated single-cell gene expression analysis. Genome Biol. 16, 241 (2015).
Yao, Z. et al. A taxonomy of transcriptomic cell types across the isocortex and hippocampal formation. Cell 184, 3222–3241 (2021).
Mbeunkui, F. & Johann, D. J. Cancer and the tumor microenvironment: a review of an essential relationship. Cancer Chemother. Pharmacol. 63, 571–582 (2009).
Sun, Y. et al. Treatment-induced damage to the tumor microenvironment promotes prostate cancer therapy resistance through WNT16B. Nat. Med. 18, 1359–1368 (2012).
McGranahan, N. & Swanton, C. Clonal heterogeneity and tumor evolution: past, present, and the future. Cell 168, 613–628 (2017).
Moncada, R. et al. Integrating microarray-based spatial transcriptomics and single-cell RNA-seq reveals tissue architecture in pancreatic ductal adenocarcinomas. Nat. Biotechnol. 38, 333–342 (2020).
Yao, H. et al. Glypican-3 and KRT19 are markers associating with metastasis and poor prognosis of pancreatic ductal adenocarcinoma. Cancer Biomark. 17, 397–404 (2016).
Liu, X. et al. A new panel of pancreatic cancer biomarkers discovered using a mass spectrometry-based pipeline. Br. J. Cancer 117, 1846–1854 (2017).
Roa-Peña, L. et al. Keratin 17 identifies the most lethal molecular subtype of pancreatic cancer. Sci. Rep. 9, 11239 (2019).
Yang, C. et al. Evaluation of the diagnostic ability of laminin gene family for pancreatic ductal adenocarcinoma. Aging (Albany NY) 11, 3679–3703 (2019).
Van den Broeck, A., Vankelecom, H., Van Eijsden, R., Govaere, O. & Topal, B. Molecular markers associated with outcome and metastasis in human pancreatic cancer. J. Exp. Clin. Cancer Res. 31, 68 (2012).
Fawkner-Corbett, D. et al. Spatiotemporal analysis of human intestinal development at single-cell resolution. Cell 184, 810–826 (2021).
Maynard, K. R. et al. Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex. Nat. Neurosci. 24, 425–436 (2021).
Zhao, E. et al. Spatial transcriptomics at subspot resolution with BayesSpace. Nat. Biotechnol. 39, 1375–1384 (2021).
Hardy, J. A. & Higgins, G. A. Alzheimer’s disease: the amyloid cascade hypothesis. Science 256, 184–186 (1992).
Murphy, M. & Levine, H. III Alzheimer’s disease and the amyloid-β peptide. J. Alzheimers Dis. 19, 311–323 (2010).
Chen, W.-T. et al. Spatial transcriptomics and in situ sequencing to study Alzheimer’s disease. Cell 182, 976–991(2020).
Stickels, R. R. et al. Highly sensitive spatial transcriptomics at near-cellular resolution with Slide-seqV2. Nat. Biotechnol. 39, 313–319 (2021).
Clark, S. J. et al. scNMT-seq enables joint profiling of chromatin accessibility DNA methylation and transcription in single cells. Nat. Commun. 9, 781 (2018).
Dries, R. et al. Giotto: a toolbox for integrative analysis and visualization of spatial expression data. Genome Biol. 22, 78 (2021).
Pham, D. et al. stLearn: integrating spatial location, tissue morphology and gene expression to find cell types, cell-cell interactions and spatial trajectories within undissociated tissues. Preprint at bioRxiv https://www.biorxiv.org/content/10.1101/2020.05.31.125658v1 (2020).
Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).
Eden, E., Navon, R., Steinfeld, I., Lipson, D. & Yakhini, Z. GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics 10, 48 (2009).
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).
Rousseeuw, P. J. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987).
Acknowledgements
We thank C.-H. L. Eng at Caltech for providing seqFISH+ image data; X. Wang at the Broad Institute and MIT for providing information on STARmap data analysis; R. Moncada at NYU for advice on PDAC data analysis; H. Koohy and A. Antanavicute from Oxford for providing full-resolution human intestine images; and O. Moindrot at Stanford for the open-source implementation of the triplet loss. We thank J. Bieber, H. Hammerlindl, L. Rao, X. Sun and other members of the Altschuler and Wu laboratories for constructive feedback. S.J.A. and L.F.W. gratefully acknowledge support from the UCSF Program for Breakthrough Biomedical Research, ProjectALS and the CZI NDNC Challenge Network. Q.D. was supported by the projects of NSFC (no. 62088102) and the MOST (no. 2020AA0105500). Y.D. was supported by the projects of NSFC (no. 61971020 and 62031001) and the MOST (no. 2020AAA0105502).
Author information
Authors and Affiliations
Contributions
F.B., Y.D. and Q.D. developed the approach and conducted simulation experiments. F.B., Y.D., S.W., B.W., S.Q.S., S.J.A. and L.F.W. conducted experimental analyses on biological datasets. The manuscript was written by F.B., Y.D., S.Q.S., S.J.A. and L.F.W. All authors read and approved the manuscript.
Corresponding authors
Ethics declarations
Competing interests
S.J.A. and L.F.W. have consulting agreements with Nine Square Therapeutics and BAKX Therapeutics involving cash and/or equity compensation. All other authors declare no competing interests.
Peer review
Peer review information
Nature Biotechnology thanks Itai Yanai, Raphael Gottardo and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Overview and simulation studies of MUSE, related to Fig. 1.
Parameters used in simulation were listed in Supplementary Table 1. (a) Summary of data and analysis used in this work. (b) A flowchart of MUSE analysis pipeline. (c) Simulation design (Methods) to generate sample profiles with two modalities used for (d-s) below. (d) tSNE visualizations of latent representations from single- and combined-modality methods for randomly selected simulation experiments in Fig. 1c. Colors: ground-truth subpopulation labels in simulation. (e) Evaluation of combined methods in simulated data with different ground-truth cluster numbers. n = 1,000 (top) and 3,000 (bottom) samples were considered in simulations. (Note: for n = 1,000 and cluster number ≥30, each cluster may only contain a small number of samples.) (f) Evaluation of multi-modal methods in simulated data with Gaussian noise for increasing variance (σ). (g) Clustering accuracies for (i) analyses of concatenated modality features using various normalization approaches (Methods), and (ii) MUSE multi-modal analysis on matched or unmatched (randomly permuted sample order on one modality) data. ARIs were calculated based on n = 10 repeats. Boxplot: center line, median; box, interquartile range; whiskers, minimum–maximum range; same annotation also applies to other boxplots in this figure. (h) Example t-SNE visualization of MUSE subpopulations (indicated by shapes) and simple superimposition of single-modality clusters (indicated by colors) with simulation parameters chosen as in (f). (i) Simulation design using real morphological features from STARmap (Methods; dataset details were described in Fig. 3) and performances of multimodal methods (right n = 10). (j) Multimodal analysis on data with homogeneous features in one modality. Transcript profiles (left) were generated from a normal distribution while morphological features (middle) were simulated from known subpopulations as before. (k) Evaluation of clustering accuracy under different dimensions of joint latent representations (n = 10). (l) Clustering accuracy of MUSE while changing dimension of morphological features between 100 to 1,000 (n = 10). (m) Clustering accuracy of MUSE when fixing the latent representation of single modality (hx, hy) to different dimensions. ARIs were averaged on 10 repeats. Red underlines: parameters selected as default. (n) Effects of clustering methods on accuracies (n = 10). Cluster numbers for hierarchical and Kmeans methods were chosen using the elbow method with distortion score. (o) Run times for compared methods on simulated data; n = 1,000 cells. Note: for fair comparison, all methods were run under CPU mode. (p) Run times of MUSE on datasets with larger sample sizes using different clustering methods in label updating during training. (q) Accuracies and run times when fixing single modality labels (denoted as lx and ly in Methods) to the initial labels in training. Each dot represented one independent experiment. (r) Model structure of multi-modal autoencoder used in MUSE. (s) Performance evaluation of MUSE with different hyperparameter settings (n = 10): 1) weight of regularization term; 2) weight of supervision term; 3) learning rate; and 4) iteration intervals between cluster updating in training. Red underlines: parameters selected as default in MUSE package. (t) F-norms of selective matrices wx and wy to different true cluster numbers (left) in data and choices of regularization hyperparameter \(\lambda _{{{{\mathrm{regularization}}}}}\) (right); n = 10. (u) Clustering accuracies (left) and number of clusters (right) from PhenoGraph when change the hyperparameter of n_neighbor (n = 10).
Extended Data Fig. 2 Analysis of mouse cortex dataset from seqFISH+, related to Fig. 2.
(a) tSNE visualization of latent space from deep image features, overlaid with various cellular properties from CellProfiler. (b) Layer annotations of MUSE clusters based on layer gene markers. Spatial localization of cell clusters (first column) and marker expression abundances (second column) were shown. For each cluster, gene names with maximal overexpression levels were underlined. Boxplot: center line, median; box, interquartile range; whiskers, minimum–maximum range. (c) Comparison of discovered cortical layers by transcriptional or combined methods. 5 layers are shown. Squares with the same color and across multiple layers indicate the method discovered merged layers. Squares with no color indicate the method failed to discover the corresponding layer. (d) Subclustering analysis on transcript L2/3/4 cluster from Fig. 2c. Kmeans clustering were performed to divide L2/3/4 into two subclusters (middle). Spatial coherences with cortex layers were shown using cell density plots (right). (e) Comparisons of subpopulations identified by different clustering methods from multimodal features. In Kmeans, target cluster number (k) was set to the subpopulation size from MUSE analysis. (f) Shared up- and down-regulated glutamatergic marker genes between MUSE clusters and cell types from Allen Brain Atlas. Marker genes were obtained from recent Allen Brain Atlas publication; 36 markers were measured in both the seqFISH+ and Allen Brain datasets.
Extended Data Fig. 3 Comparison of methods on mouse cortex dataset from STARmap, related to Fig. 3.
(a) tSNE visualization of latent representations by different methods with pseudo-colors labeling cortex depth along x-coordinate (on right side). (b) Comparison of cell clusters on (top) numbers of identified clusters with or without significant spatial co-localization properties and (bottom) feature quality evaluation by cluster compactness in latent space using Silhouette coefficient. (c) Stability analysis of identified clusters to the choice of hyperparameter n_neighbor in PhenoGraph. Red circles: major differences in subpopulations compared with the result using default parameters (left panel) annotated with affected cortex layers. (d) Spatial mapping and annotations of clusters with significant spatial co-localization patterns. Significantly co-localized clusters are identified using spatial co-localization score with permutation test. Clusters are assigned to one layer with respect to the anatomic annotations by original paper (Methods). (e) tSNE visualization of MUSE clusters in MUSE latent space. All clusters were classified into ‘Refined’, ‘Reproduced’ or ‘Discovered’ types based on comparison with clusters identified from transcript-alone or morphological-alone analysis (corresponding to Fig. 3a). (f) 3D mapping of three types of MUSE clusters in the latent space of morphological features (top layer of each 3D plot), MUSE latent features (middle layer) or transcriptional features (bottom layer). Lines connect the same cells across the three spaces.
Extended Data Fig. 4 Application of MUSE to a multimodal pancreatic ductal adenocarcinoma (PDAC) dataset, related to Fig. 4.
(a) tSNE visualizations of latent representations and identified clusters by transcripts-alone (left), H&E image-alone (middle) and MUSE (right) analyses (corresponding to Fig. 4a). (b) Manual histological annotations (colored lines) provided in original publications overlaid with regional clusters (colored circles) from image analysis. Highlighted regions show the morphological differences. (c-d) Analysis of single-cell RNA-seq data from the same PDAC tissue. tSNE visualization with cell type annotations (c) and signature gene expressions of two cancer clones (d). Cell type annotations from original publication. (e) Subclustering analysis of transcript cancer region using Kmeans method and comparisons of clone signature expressions between transcript subclusters and MUSE cancer regions. Boxplot: center line, median; box, interquartile range; whiskers, minimum–maximum range. n = 44 for subcluster 1 and n = 71 for subcluster 2. (f) Spatial expression maps of overexpressed genes in cancer regions (top) or pancreatic tissues (bottom) through differential expression analysis between pancreatic tissue regions and cancer regions characterized by MUSE (Methods) (g) Cluster separateness of tissue image spots with different size. We segmented image tiles with different pixel sizes and input them into Inception-v3 to learn deep features. Then we performed clustering on features and used Silhouette score to quantify the separateness of clusters. Red arrow indicates the chosen region size.
Extended Data Fig. 5 Application of MUSE to a Visium human intestine dataset, related to Fig. 5.
(a-b) tSNE visualizations of latent representations (a) and spatial plots (b) of identified clusters by transcripts-alone (left), H&E image-alone (middle) and MUSE (right) analyses. (c) Selected regions with various morphological patterns in the tissue. (d) Enhanced spatial maps of subpopulations from BayesSpace (left) or BayesSpace + MUSE (right). Details of the analysis were provided in Methods. (e) Selected zoom-in region examples with marker gene expressions or morphological patterns (top) and subpopulations defined from BayesSpace (middle) and MUSE (bottom) for four analyzed cell types in Fig. 5.
Extended Data Fig. 6 Application of MUSE to a multimodal Alzheimer’s disease dataset, related to Fig. 6.
(a) A summary of samples collected in the Alzheimer’s disease dataset. (b) tSNE was fitted on MUSE deep embeddings and each spot was colored by the Aβ index (defined by standard deviation of intensity in the previous study). (c) Visualization of deep embeddings of Aβ spots in the same ages. Color annotations as in (a). (d) Proportion of samples from all 4 timepoints in each MUSE cluster, related to Fig. 6e.
Supplementary information
Supplementary Information
Supplementary Tables 1–6.
Rights and permissions
About this article
Cite this article
Bao, F., Deng, Y., Wan, S. et al. Integrative spatial analysis of cell morphologies and transcriptional states with MUSE. Nat Biotechnol 40, 1200–1209 (2022). https://doi.org/10.1038/s41587-022-01251-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41587-022-01251-z
This article is cited by
-
Methods and applications for single-cell and spatial multi-omics
Nature Reviews Genetics (2023)
-
Spatial omics technologies at multimodal and single cell/subcellular level
Genome Biology (2022)
-
Graph-based autoencoder integrates spatial transcriptomics with chromatin images and identifies joint biomarkers for Alzheimer’s disease
Nature Communications (2022)