Abstract
Spatially resolved transcriptomics (SRT) provide gene expression close to, or even superior to, single-cell resolution while retaining the physical locations of sequencing and often also providing matched pathology images. However, SRT expression data suffer from high noise levels, due to the shallow coverage in each sequencing unit and the extra experimental steps required to preserve the locations of sequencing. Fortunately, such noise can be removed by leveraging information from the physical locations of sequencing, and the tissue organization reflected in corresponding pathology images. In this work, we developed Sprod, based on latent graph learning of matched location and imaging data, to impute accurate SRT gene expression. We validated Sprod comprehensively and demonstrated its advantages over previous methods for removing drop-outs in single-cell RNA-sequencing data. We showed that, after imputation by Sprod, differential expression analyses, pathway enrichment and cell-to-cell interaction inferences are more accurate. Overall, we envision de-noising by Sprod to become a key first step towards empowering SRT technologies for biomedical discoveries.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The Visium datasets are obtained from the public 10X resources/datasets website: https://www.10xgenomics.com/resources/datasets. The IDs of the datasets are: human-lymph-node-1-standard-1-1-0, Human-ovarian-cancer-whole-transcriptome-analysis-stains-dapi-anti-pan-ck-anti-cd-45-1-standard-1-2-0, human-ovarian-cancer-targeted-pan-cancer-panel-stains-dapi-anti-pan-ck-anti-cd-45-1-standard-1-2-0 and human-breast-cancer-block-a-section-1-1-standard-1-1-0. The ID of the standard 10X scRNA-seq dataset used in Fig. 1c is 10-k-peripheral-blood-mononuclear-cells-pbm-cs-from-a-healthy-donor-single-indexed-4.0.0. The Slide-Seq data are available from the publicly archived data by Stickels et al.1. Specifically, we used the Puck_200115_08 data from https://singlecell.broadinstitute.org/single_cell/study/SCP815/highly-sensitive-spatial-transcriptomics-at-near-cellular-resolution-with-slide-seqv2.
Code availability
The Sprod software is available at: https://github.com/yunguan-wang/SPROD. The doi is https://doi.org/10.5281/zenodo.604775229.
References
Stickels, R. R. et al. Highly sensitive spatial transcriptomics at near-cellular resolution with Slide-seqV2. Nat. Biotechnol. 39, 313–319 (2021).
Rodriques, S. G. et al. Slide-seq: a scalable technology for measuring genome-wide expression at high spatial resolution. Science 363, 1463–1467 (2019).
Vickovic, S. et al. High-definition spatial transcriptomics for in situ tissue profiling. Nat. Methods 16, 987–990 (2019).
Cho, C.-S. et al. Microscopic examination of spatial transcriptome using Seq-Scope. Cell 184, 3559–3572.e22 (2021).
Lee, Y. et al. XYZeq: spatially resolved single-cell RNA sequencing reveals expression heterogeneity in the tumor microenvironment. Sci. Adv. 7, eabg4755 (2021).
Li, W. V. & Li, J. J. An accurate and robust imputation method scImpute for single-cell RNA-seq data. Nat. Commun. 9, 997 (2018).
Huang, M. et al. SAVER: gene expression recovery for single-cell RNA sequencing. Nat. Methods 15, 539–542 (2018).
Lu, T. et al. Overcoming expressional drop-outs in lineage reconstruction from single-cell RNA-sequencing data. Cell Rep. 34, 108589 (2021).
Nakagawa, T., Yamada, M. & Suzuki, Y. 18F-FDG uptake in reactive neck lymph nodes of oral cancer: relationship to lymphoid follicles. J. Nucl. Med. 49, 1053–1059 (2008).
Weller, S. et al. Human blood IgM ‘memory’ B cells are circulating splenic marginal zone B cells harboring a prediversified immunoglobulin repertoire. Blood 104, 3647–3654 (2004).
Agbay, R. L. M. C. et al. Characteristics and clinical implications of reactive germinal centers in the bone marrow. Hum. Pathol. 68, 7–21 (2017).
Mayford, M., Baranes, D., Podsypanina, K. & Kandel, E. R. The 3′-untranslated region of CaMKII alpha is a cis-acting signal for the localization and translation of mRNA in dendrites. Proc. Natl Acad. Sci. USA 93, 13250–13255 (1996).
Svensson, V., Teichmann, S. A. & Stegle, O. SpatialDE: identification of spatially variable genes. Nat. Methods 15, 343–346 (2018).
Tushev, G. et al. Alternative 3′ UTRs modify the localization, regulatory potential, stability, and plasticity of mRNAs in neuronal compartments. Neuron 98, 495–511.e6 (2018).
Ainsley, J. A., Drane, L., Jacobs, J., Kittelberger, K. A. & Reijmers, L. G. Functionally diverse dendritic mRNAs rapidly associate with ribosomes following a novel experience. Nat. Commun. 5, 4510 (2014).
Wang, H., Wu, X. & Chen, Y. Stromal-immune score-based gene signature: a prognosis stratification tool in gastric cancer. Front. Oncol. 9, 1212 (2019).
Wang, T. et al. An empirical approach leveraging tumorgrafts to dissect the tumor microenvironment in renal cell carcinoma identifies missing link to prognostic inflammatory factors. Cancer Discov. 8, 1142–1155 (2018).
Jin, S. et al. Inference and analysis of cell-cell communication using CellChat. Nat. Commun. 12, 1088 (2021).
Planes-Laine, G. et al. PD-1/PD-L1 targeting in breast cancer: the first clinical evidences are emerging. a literature review. Cancers (Basel) 11, 1033 (2019).
Yuan, C. et al. Expression of PD-1/PD-L1 in primary breast tumours and metastatic axillary lymph nodes and its correlation with clinicopathological parameters. Sci. Rep. 9, 14356 (2019).
Li, C.-J., Lin, L.-T., Hou, M.-F. & Chu, P.-Y. PD‑L1/PD‑1 blockade in breast cancer: the immunotherapy era (Review). Oncol. Rep. 45, 5–12 (2021).
Wang, X. et al. Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science 361, eaat5691 (2018).
Wang, L. & Li, R.-C. Learning low-dimensional latent graph structures: a density estimation approach. IEEE Trans. Neural Netw. Learn. Syst. 31, 1098–1112 (2020).
Zhang, R., Atwal, G. S. & Lim, W. K. Noise regularization removes correlation artifacts in single-cell RNA-seq data preprocessing. Patterns (N. Y.) 2, 100211 (2021).
Andrews, T. S. & Hemberg, M. False signals induced by single-cell imputation. [version 2; peer review: 4 approved]. F1000Res. 7, 1740 (2018).
Haralick, R. M., Shanmugam, K. & Dinstein, I. Textural features for image classification. IEEE Trans. Syst. Man Cybern. 3, 610–621 (1973).
Zhang, Z., Xiong, D., Wang, X., Liu, H. & Wang, T. Mapping the functional landscape of T cell receptor repertoires by single-T cell transcriptomics. Nat. Methods 18, 92–99 (2021).
Adossa, N. A., Rytkönen, K. T. & Elo, L. L. Dirichlet process mixture models for single-cell RNA-seq clustering. Biol. Open 11, bio059001 (2022).
Acknowledgements
We acknowledge the ENCODE Consortium and the ENCODE production laboratories that generated the eCLIP datasets used in our study. We acknowledge J. Johnson for providing input on the interpretation of the mouse Slide-Seq data. This study was supported by the National Institutes of Health (NIH) (5P30CA142543 to T.W., G.X. and Y.X., 1R01CA258584 to T.W., U01AI156189 to T.W. and Y.X., R01DE030656 to G.X., R01GM141519 to G.X., R01GM140012 to G.X., U01CA249245 to G.X., R35GM136375 to Y.X., 2P50CA070907 to T.W., Y.X. and G.X., R01AG075582 to L.W., 3U01AI156189-01S1 to T.W.), National Science Foundation (NSF DMS-2009689 to L.W.), and Cancer Prevention Research Institute of Texas (CPRIT RP190208 to T.W. and RP190107 to G.X.).
Author information
Authors and Affiliations
Contributions
Y.W. and B.S. implemented the software and contributed bioinformatics analyses. L.W. and T.W. designed the model. M.C. provided input on the interpretation of the pathology analyses. Y.X., S.W. and G.X. provided input on the analyses and the writing. T.W. supervised the whole study.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Methods thanks Nikos Karaiskos and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary handling editor: Lin Tang, in collaboration with the Nature Methods team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Correlation between PTPRC RNA expression (targeted) and CD45 protein expression (IF).
Red arrows mean that the results are limited to spots of higher quality. Yellow arrows mean that the single gene of PTPRC is replaced by a signature of PTPRC by including correlated genes.
Extended Data Fig. 3 Gene expression clustering of the beads in the mouse brain Slide-Seq dataset.
Gene expression clustering of the beads in the mouse brain Slide-Seq dataset reflects the multi-cellular structures of mouse brain hippocampus. a, Slide-seq dataset puck 200306_03 and b puck 200115_08 by Stickels et al.1.
Extended Data Fig. 4 Deviances between CD45 IF intensities and the expression levels of PTPRC (left: original, right: denoised).
CD45 IF intensities and PTPRC expression values were normalized and distributionally warped to the same scale so they can be directly compared. The differences between CD45 IF and PTPRC on each spot are denoted by color. Red refers to small differences and green refers to larger differences.
Extended Data Fig. 5 Spatial IgD expression of the raw Visium data (left) and the Sprod-adjusted data (right).
The red circles mark the mantle zone to be highlighted in Fig. 3d.
Extended Data Fig. 6 Spearman correlations between IgD and CD3/CD20/CD1c for the human lymph node Visium dataset.
Results are shown for the original expression data, SAVER/scImpute-corrected data, the Sprod-corrected data, and the Sprod-corrected data with image/location information scrambled.
Extended Data Fig. 7 Extraction of four tumor regions.
The four tumor regions (blue, green, orange and red) that were extracted, according to expressional clustering and concordance with the H&E stained slide.
Supplementary information
Supplementary Information
Supplementary Notes 1 and 2 and Table 1.
Supplementary Table
Supplementary Table 2.
Supplementary Software
Sprod v.1.0 was provided with this publication for documentation purpose.
Rights and permissions
About this article
Cite this article
Wang, Y., Song, B., Wang, S. et al. Sprod for de-noising spatially resolved transcriptomics data based on position and image information. Nat Methods 19, 950–958 (2022). https://doi.org/10.1038/s41592-022-01560-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41592-022-01560-w
This article is cited by
-
Unsupervised spatially embedded deep representation of spatial transcriptomics
Genome Medicine (2024)
-
TISSUE: uncertainty-calibrated prediction of single-cell spatial transcriptomics improves downstream analyses
Nature Methods (2024)
-
PROST: quantitative identification of spatially variable genes and domain detection in spatial transcriptomics
Nature Communications (2024)
-
Advances in spatial transcriptomics and related data analysis strategies
Journal of Translational Medicine (2023)
-
Smoother: a unified and modular framework for incorporating structural dependency in spatial omics data
Genome Biology (2023)