Abstract
Whole-transcriptome spatial profiling of genes at single-cell resolution remains a challenge. To address this limitation, spatial gene expression prediction methods have been developed to infer the spatial expression of unmeasured transcripts, but the quality of these predictions can vary greatly. Here we present Transcript Imputation with Spatial Single-cell Uncertainty Estimation (TISSUE) as a general framework for estimating uncertainty for spatial gene expression predictions and providing uncertainty-aware methods for downstream inference. Leveraging conformal inference, TISSUE provides well-calibrated prediction intervals for predicted expression values across 11 benchmark datasets. Moreover, it consistently reduces the false discovery rate for differential gene expression analysis, improves clustering and visualization of predicted spatial transcriptomics and improves the performance of supervised learning models trained on predicted gene expression profiles. Applying TISSUE to a MERFISH spatial transcriptomics dataset of the adult mouse subventricular zone, we identified subtypes within the neural stem cell lineage and developed subtype-specific regional classifiers.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
All processed spatial transcriptomics and RNA-seq dataset pairings, including the final annotated adult mouse SVZ MERFISH dataset, have been deposited at https://doi.org/10.5281/zenodo.8259942. Other data files (raw images and large intermediate data files) can be provided upon reasonable request. Raw data were accessed from existing benchmark datasets7 and are also available from the following studies:
Mouse hippocampus: Spatial transcriptomics (seqFISH) at https://content.cruk.cam.ac.uk/jmlab/SpatialMouseAtlas2020/; RNA-seq (10x Chromium) at GSE158450 in the Gene Expression Omnibus (GEO) for ‘HIPP_sc_Rep1_10X sample’.
Mouse primary visual cortex: Spatial transcriptomics (MERFISH) at https://github.com/spacetx-spacejam/data; RNA-seq (Smart-seq) at https://portal.brain-map.org/atlases-and-data/rnaseq/mouse-v1-and-alm-smart-seq for mouse primary visual cortex.
Mouse prefrontal cortex: Spatial transcriptomics (STARmap) at ‘20180419_BZ9_control’ in https://www.starmapresources.com/data; RNA-seq (10x Chromium) at GSE158450 in the GEO for ‘PFC_sc_Rep2_10X’.
Human middle temporal gyrus: Spatial transcriptomics (ISS) at https://github.com/spacetx-spacejam/data; RNA-seq (Smart-seq) at https://portal.brain-map.org/atlases-and-data/rnaseq/human-mtg-smart-seq.
Mouse primary visual cortex: Spatial transcriptomics (ISS) at https://github.com/spacetx-spacejam/data; RNA-seq (Smart-seq) at https://portal.brain-map.org/atlases-and-data/rnaseq/mouse-v1-and-alm-smart-seq for mouse primary visual cortex.
Drosophila embryo: Spatial transcriptomics (FISH) at https://github.com/rajewsky-lab/distmap/; RNA-seq (Drop-seq) at GSE95025 in GEO.
Mouse somatosensory cortex: Spatial transcriptomics (osmFISH) at http://linnarssonlab.org/osmFISH/ for cortical region subset; RNA-seq (Smart-seq) at https://portal.brain-map.org/atlases-and-data/rnaseq/mouse-whole-cortex-and-hippocampus-smart-seq for mouse somatosensory cortex.
Mouse primary visual cortex: Spatial transcriptomics (ExSeq) at https://github.com/spacetx-spacejam/data; RNA-seq (Smart-seq) at https://portal.brain-map.org/atlases-and-data/rnaseq/mouse-v1-and-alm-smart-seq for mouse primary visual cortex.
Mouse gastrulation: Spatial transcriptomics (seqFISH) at https://content.cruk.cam.ac.uk/jmlab/SpatialMouseAtlas2020/; RNA-seq (10x Chromium) ‘Sample 21’ in the MouseGastrulationData R package.
Human U2OS: Spatial transcriptomics (MERFISH) at https://www.pnas.org/doi/suppl/10.1073/pnas.1912459116/suppl_file/pnas.1912459116.sd12.csv; RNA-seq (10x Chromium) at ’BC22’ in GSE152048 in the GEO database.
Axolotl brain: Spatial transcriptomics (Stereo-seq) at ‘Stage44.h5ad’ in https://db.cngb.org/stomics/artista/download/; RNA-seq (10x Chromium) at ‘animal1’ in ‘all_nuclei_clustered_highlevel_anno.rds’ at https://zenodo.org/records/6390083.
Code availability
The TISSUE Python package and associated code and documentation are available at https://github.com/sunericd/TISSUE/, and all code for generating figures and analyses is separately available at https://github.com/sunericd/tissue-figures-and-analyses/.
References
Maynard, K. R. et al. Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex. Nat. Neurosci. 24, 425–436 (2021).
Asp, M. et al. A spatiotemporal organ-wide gene expression and cell atlas of the developing human heart. Cell 179, 1647–1660 (2019).
Moncada, R. et al. Integrating microarray-based spatial transcriptomics and single-cell RNA-seq reveals tissue architecture in pancreatic ductal adenocarcinomas. Nat. Biotechnol. 38, 333–342 (2020).
Ji, A. L. et al. Multimodal analysis of composition and spatial architecture in human squamous cell carcinoma. Cell 182, 497–514 (2020).
Moses, L. & Pachter, L. Museum of spatial transcriptomics. Nat. Methods https://doi.org/10.1038/s41592-022-01409-2 (2022).
Wei, R. et al. Spatial charting of single-cell transcriptomes in tissues. Nat. Biotechnol. 40, 1190–1199 (2022).
Li, B. et al. Benchmarking spatial and single-cell transcriptomics integration methods for transcript distribution prediction and cell type deconvolution. Nat Methods 19, 662–670 (2022).
Abdelaal, T., Mourragui, S., Mahfouz, A. & Reinders, M. J. T. SpaGE: spatial gene enhancement using scRNA-seq. Nucleic Acids Res. 48, e107 (2020).
Shengquan, C., Boheng, Z., Xiaoyang, C., Xuegong, Z. & Rui, J. stPlus: a reference-based method for the accurate enhancement of spatial transcriptomics. Bioinformatics 37, i299–i307 (2021).
Allen, W. E., Blosser, T. R., Sullivan, Z. A., Dulac, C. & Zhuang, X. Molecular and spatial signatures of mouse brain aging at single-cell resolution. Cell 186, 194–208(2023).
Welch, J. D. et al. Single-cell multi-omic integration compares and contrasts features of brain cell identity. Cell 177, 1873–1887 (2019).
Lopez, R. et al. A joint model of unpaired data from scRNA-seq and spatial transcriptomics for imputing missing gene expression measurements. ICML Workshop on Computational Biology (2019).
Biancalani, T. et al. Deep learning and alignment of spatially resolved single-cell transcriptomes with Tangram. Nat. Methods 18, 1352–1362 (2021).
Vahid, M. R. et al. High-resolution alignment of single-cell and spatial transcriptomes with CytoSPACE. Nat. Biotechnol. https://doi.org/10.1038/s41587-023-01697-9 (2023).
Cang, Z. & Nie, Q. Inferring spatial and signaling relationships between cells from single cell transcriptomic data. Nat. Commun. 11, 2084 (2020).
Moriel, N. et al. NovoSpaRc: flexible spatial reconstruction of single-cell gene expression with optimal transport. Nat. Protoc. 16, 4177–4200 (2021).
Mourragui, S., Loog, M., van de Wiel, M. A., Reinders, M. J. T. & Wessels, L. F. A. PRECISE: a domain adaptation approach to transfer predictors of drug response from pre-clinical models to tumors. Bioinformatics 35, i510–i519 (2019).
Lubeck, E., Coskun, A. F., Zhiyentayev, T., Ahmad, M. & Cai, L. Single-cell in situ RNA profiling by sequential hybridization. Nat. Methods 11, 360–361 (2014).
Chen, K. H., Boettiger, A. N., Moffitt, J. R., Wang, S. & Zhuang, X. Spatially resolved, highly multiplexed RNA profiling in single cells. Science 348, aaa6090 (2015).
Wang, X. et al. Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science 361, eaat5691 (2018).
Ke, R. et al. In situ sequencing for RNA analysis in preserved tissue and cells. Nat. Methods 10, 857–860 (2013).
Langer-Safer, P. R., Levine, M. & Ward, D. C. Immunological method for mapping genes on Drosophila polytene chromosomes. Proc. Natl Acad. Sci. USA 79, 4381–4385 (1982).
Codeluppi, S. et al. Spatial organization of the somatosensory cortex revealed by osmFISH. Nat. Methods 15, 932–935 (2018).
Alon, S. et al. Expansion sequencing: spatially precise in situ transcriptomics in intact biological systems. Science 371, eaax2656 (2021).
Wei, X. et al. Single-cell Stereo-seq reveals induced progenitor cells involved in axolotl brain regeneration. Science 377, eabp9444 (2022).
Long, B., Miller, J. & Consortium, T. S. SpaceTx: a roadmap for benchmarking spatial transcriptomics exploration of the brain. Preprint at http://arxiv.org/abs/2301.08436 (2023).
Joglekar, A. et al. A spatially resolved brain region- and cell type-specific isoform atlas of the postnatal mouse brain. Nat. Commun. 12, 463 (2021).
Booeshaghi, A. S. et al. Isoform cell-type specificity in the mouse primary motor cortex. Nature 598, 195–199 (2021).
Gyllborg, D. et al. Hybridization-based in situ sequencing (HybISS) for spatially resolved transcriptomics in human and mouse brain tissue. Nucleic Acids Res. 48, e112 (2020).
Karaiskos, N. et al. The Drosophila embryo at single-cell transcriptome resolution. Science 358, 194–199 (2017).
Nitzan, M., Karaiskos, N., Friedman, N. & Rajewsky, N. Gene expression cartography. Nature 576, 132–137 (2019).
Hodge, R. D. et al. Conserved cell types with divergent features in human versus mouse cortex. Nature 573, 61–68 (2019).
Yao, Z. et al. A taxonomy of transcriptomic cell types across the isocortex and hippocampal formation. Cell 184, 3222–3241 (2021).
Tasic, B. et al. Shared and distinct transcriptomic cell types across neocortical areas. Nature 563, 72–78 (2018).
Shah, S., Lubeck, E., Zhou, W. & Cai, L. In situ transcription profiling of single cells reveals spatial organization of cells in the mouse hippocampus. Neuron 92, 342–357 (2016).
Lohoff, T. et al. Integration of spatial and single-cell transcriptomic data elucidates mouse organogenesis. Nat. Biotechnol. 40, 74–85 (2022).
Lust, K. et al. Single-cell analyses of axolotl telencephalon organization, neurogenesis, and regeneration. Science 377, eabp9262 (2022).
Zhou, Y. et al. Single-cell RNA landscape of intratumoral heterogeneity and immunosuppressive microenvironment in advanced osteosarcoma. Nat. Commun. 11, 6322 (2020).
Xia, C., Fan, J., Emanuel, G., Hao, J. & Zhuang, X. Spatial transcriptome profiling by MERFISH reveals subcellular RNA compartmentalization and cell cycle-dependent gene expression. Proc. Natl Acad. Sci. USA 116, 19490–19499 (2019).
Angelopoulos, A. N. & Bates, S. A gentle introduction to conformal prediction and distribution-free uncertainty quantification. Preprint at http://arxiv.org/abs/2107.07511 (2022).
Shafer, G. & Vovk, V. A tutorial on conformal prediction. J. Mach. Learn. Res. 9, 371–421 (2008).
Lei, J., G’Sell, M., Rinaldo, A., Tibshirani, R. J. & Wasserman, L. Distribution-free predictive inference for regression. J. Am. Stat. Assoc. 113, 1094–1111 (2018).
Wieslander, H. et al. Deep learning with conformal prediction for hierarchical analysis of large-scale whole-slide tissue images. IEEE J. Biomed. Health Informatics 25, 371–380 (2021).
Alvarsson, J., Arvidsson McShane, S., Norinder, U. & Spjuth, O. Predicting with confidence: using conformal prediction in drug discovery. J. Pharm. Sci. 110, 42–49 (2021).
Jin, Y., Ren, Z. & Candès, E. J. Sensitivity analysis of individual treatment effects: a robust conformal inference approach. Proc. Natl Acad. Sci. USA 120, e2214889120 (2023).
Wang, Y. et al. Sprod for de-noising spatially resolved transcriptomics data based on position and image information. Nat. Methods 19, 950–958 (2022).
Palmer, C. & Pe’er, I. Bias characterization in probabilistic genotype data and improved signal detection with multiple imputation. PLoS Genet. 12, e1006091 (2016).
Allison, P. D. Missing Data https://methods.sagepub.com/book/missing-data (SAGE Publications, 2002).
Little, R. J. A. & Rubin, D. B. Bayes and Multiple Imputation. In Statistical Analysis with Missing Data (eds Little, R. J. A. & Rubin, D. B.) 200–220 (John Wiley & Sons, Inc., 2002); https://onlinelibrary.wiley.com/doi/abs/10.1002/9781119013563.ch10
Licht, C. New methods for generating significance levels from multiply-imputed data. Ph.D. thesis, Otto-Friedrich-Universität Bamberg, Fakultät Sozial- und Wirtschaftswissenschaften https://fis.uni-bamberg.de/handle/uniba/263 (2010).
Zhu, J., Shang, L. & Zhou, X. SRTsim: spatial pattern preserving simulations for spatially resolved transcriptomics. Genome Biol. 24, 39 (2023).
Svensson, V., Teichmann, S. A. & Stegle, O. SpatialDE: identification of spatially variable genes. Nat. Methods 15, 343–346 (2018).
Yang, C. B., Kiser, P. J., Zheng, Y. T., Varoqueaux, F. & Mower, G. D. Bidirectional regulation of Munc13-3 protein expression by age and dark rearing during the critical period in mouse visual cortex. Neuroscience 150, 603–608 (2007).
Miller, J. A., Woltjer, R. L., Goodenbour, J. M., Horvath, S. & Geschwind, D. H. Genes and pathways underlying regional and cell type changes in Alzheimer’s disease. Genome Med. 5, 48 (2013).
Artegiani, B. et al. A single-cell RNA sequencing study reveals cellular and molecular dynamics of the hippocampal neurogenic niche. Cell Rep. 21, 3271–3284 (2017).
Siddiqui, T. J. et al. An LRRTM4-HSPG complex mediates excitatory synapse development on dentate gyrus granule cells. Neuron 79, 680–695 (2013).
Buckley, M. T. et al. Cell-type-specific aging clocks to quantify aging and rejuvenation in neurogenic regions of the brain. Nat. Aging 3, 121–137 (2023).
Scialdone, A. et al. Computational assignment of cell-cycle stage from single-cell transcriptome data. Methods 85, 54–61 (2015).
Lotfollahi, M., Wolf, F. A. & Theis, F. J. scGen predicts single-cell perturbation responses. Nat. Methods 16, 715–721 (2019).
Sun, E. D., Ma, R. & Zou, J. Dynamic visualization of high-dimensional data. Nat. Comput. Sci. 3, 86–100 (2023).
Delchambre, L. Weighted principal component analysis: a weighted covariance eigendecomposition approach. Mon. Not. R. Astron. Soc. 446, 3545–3555 (2015).
Navarro Negredo, P., Yeo, R. W. & Brunet, A. Aging and rejuvenation of neural stem cells and their niches. Cell Stem Cell 27, 202–223 (2020).
Doetsch, F. A niche for adult neural stem cells. Curr. Opin. Genet. Dev. 13, 543–550 (2003).
Alvarez-Buylla, A. & Garcıia-Verdugo, J. M. Neurogenesis in adult subventricular zone. J. Neurosci. 22, 629–634 (2002).
Dulken, B. W. et al. Single-cell analysis reveals T cell infiltration in old neurogenic niches. Nature 571, 205–210 (2019).
Liu, L. et al. Exercise reprograms the inflammatory landscape of multiple stem cell compartments during mammalian aging. Cell Stem Cell 30, 689–705 (2023).
Cebrian-Silla, A. et al. Single-cell analysis of the ventricular-subventricular zone reveals signatures of dorsal and ventral adult neurogenesis. eLife 10, e67436 (2021).
Chaker, Z., Codega, P. & Doetsch, F. A mosaic world: puzzles revealed by adult neural stem cell heterogeneity. Wiley Interdiscip. Rev. Dev. Biol. 5, 640–658 (2016).
Dong, K. & Zhang, S. Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder. Nat. Commun. 13, 1739 (2022).
Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902 (2019).
Gayoso, A. et al. Joint probabilistic modeling of single-cell multi-omic data with totalVI. Nat. Methods 18, 272–282 (2021).
Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 16, 1289–1296 (2019).
Palla, G. et al. Squidpy: a scalable framework for spatial omics analysis. Nat. Methods 19, 171–178 (2022).
Marshall, A., Altman, D. G., Holder, R. L. & Royston, P. Combining estimates of interest in prognostic modelling studies after multiple imputation: current practice and guidelines. BMC Med. Res. Methodol. 9, 57 (2009).
Acknowledgements
Funding support was provided by Knight-Hennessy Scholars program (to E.D.S.), Paul and Daisy Soros Fellowship for New Americans (to E.D.S.), the National Science Foundation Graduate Research Fellowship Program (to E.D.S.), D. Donoho at Stanford University (to R.M.), National Institutes of Health P01AG036695 (to A.B.), NSF CAREER 1942926 (to J.Z.), National Institutes of Health P30AG059307 (to J.Z.), 5RM1HG010023 (to J.Z.) and grants from the Silicon Valley Foundation (to J.Z.) and the Chan Zuckerberg Initiative (to J.Z.). We thank L. Xu, O. Zhou and M. Yuksekgonul for helpful discussions.
Author information
Authors and Affiliations
Contributions
E.D.S. and J.Z. conceived of the study. E.D.S. designed and implemented the method and ran all associated analyses with J.Z. and R.M providing input. P.N.N. and A.B. provided samples for the mouse SVZ MERFISH dataset and input on associated analyses. E.D.S. prepared a draft of the paper. R.M., P.N.N., A.B. and J.Z. edited the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Methods thanks Nancy Zhang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editor: Rita Strack, in collaboration with the Nature Methods team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Overview of datasets and prediction performance.
a, Visualization of cells in the eleven spatial transcriptomics datasets colored by the expression of the highest-expressed gene in each respective dataset. Abbreviations are as follows: hippocampus (Hipp.) primary visual cortex (VISP), prefrontal cortex (PC), middle temporal gyrus (MTG), somatosensory cortex (SC), gastrulation (Gast.), U-2 OS cell line (U2OS). b,c, Performance of all three gene prediction methods (Harmony, SpaGE, Tangram) on all datasets as measured by (b) gene-wise mean absolute error between predicted and actual gene expression over 10-fold cross-validation, and (c) gene-wise Pearson correlation between predicted and actual gene expression over 10-fold cross-validation. Shown also are the number of cells (n) in the spatial transcriptomics datasets and the number of genes (p) shared between spatial and RNAseq datasets. In panels b-c, the inner box corresponds to quartiles of the metrics and the whiskers span up to 1.5 times the interquartile range of the metrics.
Extended Data Fig. 2 Evidence of gene expression similarity between spatial neighbors.
a, Cosine similarity of gene expression profiles for 250 cells paired with all their neighbors in the TISSUE spatial graph compared to pairings with randomly drawn cells across all eleven spatial transcriptomics datasets. The boxplot corresponds to the quartiles of the cosine similarity measurements. The center line corresponds to median cosine similarity, which was strictly higher in the neighbor-paired comparisons than the random-paired comparisons across all datasets. Whiskers span up to 1.5 times the interquartile range of the metrics and values outside this range are shown as dots. Abbreviations are as follows: hippocampus (Hipp.) primary visual cortex (VISP), prefrontal cortex (PC), middle temporal gyrus (MTG), somatosensory cortex (SC), gastrulation (Gast.), U-2 OS cell line (U2OS). b, Scatter plots of the cosine similarities of gene expression profiles for 250 cells paired with their neighbors for either the training gene set or the test gene set determined by random train-test split of all genes (50% train, 50% test). Shown are cosine similarity pairs for 10 train-test splits for the two benchmark spatial transcriptomics datasets with the most measured genes.
Extended Data Fig. 3 Cell-centric variability and calibration score distributions for individual datasets and prediction methods.
a, Pearson correlation of all cell-centric variability measures obtained for different numbers of neighbors in building the TISSUE spatial graph compared to the default setting of 15 neighbors. b, Correlation of cell-centric variability and absolute prediction error shown individually for each dataset and prediction method combination computed over 10-fold cross-validation. Log density with added pseudocount (Log1p) is shown by color, with a maximum of 1000 cells and 300 genes sampled from each dataset to provide more uniform representation. c, Histograms showing the distribution of Pearson correlations between either gene-wise or cell-wise similarities of prediction errors and similarities of predicted expression values across all spatial transcriptomic datasets and across all prediction methods. d, Distribution of TISSUE calibration scores shown individually for each dataset and prediction method combination ((kg, kc) = (4, 1)). Details on each dataset and prediction method can be found in Methods. Abbreviations are as follows: hippocampus (Hipp.) primary visual cortex (VISP), prefrontal cortex (PC), middle temporal gyrus (MTG), somatosensory cortex (SC), gastrulation (Gast.), U-2 OS cell line (U2OS).
Extended Data Fig. 4 Further evaluation of TISSUE prediction intervals.
a-c, Correlation plots across all dataset and prediction method combinations computed over 10-fold cross-validation for (a) the 67% prediction interval width and absolute prediction error, both normalized by the absolute value of the predicted expression; (b) 50% prediction interval width and absolute prediction error; (c) 80% prediction interval width and absolute prediction error. Log density with added pseudocount (Log1p) is shown by color, with a maximum of 1000 cells and 300 genes sampled from each dataset to provide more uniform representation. d, Gene-level calibration curves for TISSUE prediction intervals showing empirical coverage as a function of the specified confidence level across 10-fold cross-validation. Each line corresponds to an independent gene in the spatial transcriptomics dataset. Abbreviations are as follows: hippocampus (Hipp.) primary visual cortex (VISP), prefrontal cortex (PC), middle temporal gyrus (MTG), somatosensory cortex (SC), gastrulation (Gast.), U-2 OS cell line (U2OS). e,f, Calibration curves for TISSUE prediction intervals showing empirical coverage as a function of the specified confidence level across 10-fold cross-validation (e) under automated setting of (kg, kc) for stratified grouping; and (f) for two technical replicates of the mouse gastrulation seqFISH dataset with (kg, kc) = (4, 1). The calibration error is annotated for each prediction method (see Methods). g, Calibration curves for TISSUE prediction intervals showing empirical coverage as a function of the specified confidence level across 10-fold cross-validation for the mouse somatosensory cortex osmFISH dataset with different combinations of Sprod de-noising or Sprod-based spatial similarity graph instead of the TISSUE spatial neighbors graph. The calibration error is annotated for each prediction method (see Methods). h, Correlation plot of 67% prediction interval width with TISSUE spatial neighbors graph with cosine similarity weighting and 67% prediction interval width with Sprod similarity graph and weighting for the mouse somatosensory cortex osmFISH dataset and all prediction methods computed over 10-fold cross-validation.
Extended Data Fig. 5 Additional differential gene expression analysis with TISSUE.
a, False discovery rate of differentially expressed genes between cell type or anatomic region labels (one versus all approach) using the differentially expressed genes on the measured gene expression profiles as the ground truth across different p-value cutoffs. P-values were computed using two-sided t-test. Discoveries are assessed across all genes for all class labels. Shown are results for all three prediction methods and all spatial transcriptomics datasets with cell type or region labels available. All calibration scores were generated with (kg, kc) = (4, 1) settings for stratified grouping. Abbreviations are as follows: hippocampus (Hipp.) primary visual cortex (VISP), middle temporal gyrus (MTG), somatosensory cortex (SC), gastrulation (Gast.). b, False discovery rate of differentially expressed genes between cell type or anatomic region labels (one versus all approach) as a function of the number of discoveries and with automated stratified grouping. c, False discovery rate of differentially expressed genes between cell type or anatomic region labels (one versus all approach) as a function of the number of discoveries and with (kg, kc) = (4, 1) settings for stratified grouping for the alternative TISSUE multiple imputation framework using the ‘greater than’ one-sided Wilcoxon/Mann-Whitney test. d, False discovery rate of spatially variable genes as a function of the number of discoveries and with (kg, kc) = (4, 1) settings for stratified grouping for the alternative TISSUE multiple imputation framework using the SpatialDE test. e, Correlation plot of the log p-values obtained from the TISSUE multiple imputation t-test framework between two technical replicates of the mouse gastrulation seqFISH dataset.
Extended Data Fig. 6 Additional experiments for uncertainty-aware supervised learning, clustering, and visualization.
a-c, Downstream task performance metrics on the three most prominent anatomic region class labels for the mouse somatosensory osmFISH dataset. Shown are metrics for all three prediction methods with automated stratified grouping settings. P-value was computed using a paired two-sided t-test on n = 3 independent prediction methods. The box corresponds to quartiles of the metrics and the whiskers span up to 1.5 times the interquartile range of the metrics. (a) Accuracy, F1 score, and ROC-AUC (receiver-operator characteristic area under the curve) metrics for logistic regression models trained on the predicted gene expression, TISSUE-filtered predicted gene expression, or measured gene expression for classification. (b) Adjusted Rand index (ARI) for k-means clustering (k = 3) on the top 15 principal components obtained from the predicted gene expression, TISSUE-filtered predicted gene expression, or measured gene expression for classification. (c) Linear separability measured as classification accuracy of linear kernel support vector classifier fitted on the top 15 principal components obtained from the predicted gene expression, TISSUE-filtered predicted gene expression, or measured gene expression for classification. d, Average improvement of performance metrics using TISSUE-filtered approach in lieu of unfiltered approach on predicted expression for supervised learning (Accuracy, F1, ROC-AUC), clustering (adjusted Rand index (ARI)), and visualization (linear separability) for the top three classes across all dataset and class label combinations. Results were obtained using the 50% prediction interval width for filtering. Abbreviations are as follows: hippocampus (Hipp.) primary visual cortex (VISP), middle temporal gyrus (MTG), somatosensory cortex (SC), gastrulation (Gast.). Asterisks denote significant difference in performance metrics between TISSUE-filtered approach and unfiltered approach (p<0.05) with p-values computed using a paired two-sided t-test on n=3 independent prediction methods. e, Same as panel d except with the 80% prediction interval width for filtering.
Extended Data Fig. 7 Uncertainty-aware clustering and label separation with TISSUE-WPCA.
a, Schematic illustration of the weighted principal component analysis (WPCA) pipeline where the inverse TISSUE prediction interval width is used to obtain principal components from WPCA, which are then used for downstream tasks of clustering and label separation. b, Linear separability measured as the binary classification accuracy of a linear kernel support vector classifier fitted on the two cell clusters in the simulated spatial transcriptomics data as a function of the simulated mix-in proportion. The classifier was trained on the top 15 principal components obtained from the measured gene expression profiles with PCA, predicted gene expression profiles with PCA, and predicted gene expression profiles with TISSUE-WPCA. For TISSUE-WPCA, weights were determined by binarizing the inverse normalized 67% prediction interval width (see Methods). Results were obtained using automated stratified grouping. Bands represent the interquartile range and solid line denotes the median linear separability across 20 simulated datasets. c, Same as in panel b except with TISSUE-WPCA weighting using the log-transformed inverse normalized 67% prediction interval width. d, Adjusted Rand index (ARI) for k-means clustering (k = 3) on the top 15 principal components obtained from PCA on the predicted expression or TISSUE-WPCA on the predicted gene expression for six real spatial transcriptomics dataset and label pairings and all prediction methods. P-value was computed using a paired two-sided t-test on n=18 sets of predictions across 3 independent prediction methods and 6 independent dataset and class label combinations. The box corresponds to quartiles of the metrics and the whiskers span up to 1.5 times the interquartile range of the metrics.
Extended Data Fig. 8 TISSUE is necessary to identify ambiguous NSC lineage subtype.
a, Heatmap of the scaled log-normalized gene expression of original cell type markers in the adult mouse subventricular zone MERFISH dataset for each of the identified cell type clusters. The Ambiguous cell type cluster in the first row exhibits high expression of qNSC/astrocyte, aNSC/NPC, and neuroblast markers. b, Additional predicted marker genes for the second ambiguous subcluster are differentially expressed for all qNSC/astrocyte and aNSC/NPC markers under traditional hypothesis testing with two-sided t-test on the predicted gene expression (Predicted). With TISSUE multiple imputation two-sided t-test, there are substantially more aNSC/NPC markers that are differentially over-expressed in the ambiguous subcluster (TISSUE), permitting identification of this subcluster as an aNSC/NPC subtype cluster. P-values are shown for all predicted marker genes with significance threshold of Bonferroni-adjusted p < 0.1 for either two-sided t-test or TISSUE multiple imputation two-sided t-test. c, Table indicating whether each of the three cell subtypes of the NSC lineage could be resolved from predicted marker genes using baseline or TISSUE-based approaches. Green checks indicate successful identification of cell subtype and red crosses indicate unsuccessful identification of cell subtype. d, Relative proportion of each of the three TISSUE-identified subtypes in the neural stem cell lineage cluster for either the left or right lateral ventricle. e, Relative proportions of aNSC/NPC and neuroblast populations across the MERFISH dataset and three single-cell RNAseq datasets of the mouse subventricular zone. The qNSC/astrocyte proportions were not compared since they were aggregated with astrocytes of the striatum in the single-cell RNAseq datasets. f, Spatial visualization of the cells in the neural stem cell lineage cluster colored by dorsal or ventral spatial location labels. g, Dorsal versus ventral classification performance of TISSUE-filtered penalized logistic regression models and baseline unfiltered penalized logistic regression models evaluated using 10-fold cross-validation across F1 score, accuracy, area under the receiver-operator curve, and average precision.
Extended Data Fig. 9 Computational runtime for TISSUE.
a, Bar plots of total runtimes for spatial gene expression prediction computations over 10 predictions to generate estimated predictions on all calibration genes. Bars denote the mean runtime across 10 instances of TISSUE prediction and each dot represents the runtime for one instance of generating TISSUE predictions using 10-fold cross-validation. b, Bar plots of total runtimes for TISSUE prediction interval calculation including computation of cell-centric variability and calibration score sets. Bars denote the mean runtime across 10 instances of TISSUE prediction interval calculation and each dot represents the runtime for one instance of TISSUE prediction interval calculation.
Supplementary information
Supplementary Information
Supplementary Fig. 1.
Supplementary Table 1
Overview of dataset pairings between spatial transcriptomics and RNA-seq used for TISSUE evaluation.
Supplementary Table 2
Downstream analysis benchmarking performances of TISSUE with different spatial gene expression prediction methods. The table is organized by groups of related downstream analysis benchmarking tasks (rows). The numbers at the end of task descriptions index unique data contexts (for example, dataset, dataset and label combination) within each group of tasks. TISSUE methods (with bold column titles) are compared to non-TISSUE methods (adjacent columns) and the superior performance (if any) is highlighted in green. Each cell in the table constitutes a unique benchmarking context (that is, imputation method, dataset, application and metric).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sun, E.D., Ma, R., Navarro Negredo, P. et al. TISSUE: uncertainty-calibrated prediction of single-cell spatial transcriptomics improves downstream analyses. Nat Methods 21, 444–454 (2024). https://doi.org/10.1038/s41592-024-02184-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41592-024-02184-y