Identification of spatial expression trends in single-cell gene expression data

Edsgärd, Daniel; Johnsson, Per; Sandberg, Rickard

doi:10.1038/nmeth.4634

Brief Communication
Published: 19 March 2018

Identification of spatial expression trends in single-cell gene expression data

Nature Methods volume 15, pages 339–342 (2018)Cite this article

16k Accesses
154 Citations
43 Altmetric
Metrics details

Subjects

Abstract

As methods for measuring spatial gene expression at single-cell resolution become available, there is a need for computational analysis strategies. We present trendsceek, a method based on marked point processes that identifies genes with statistically significant spatial expression trends. trendsceek finds these genes in spatial transcriptomic and sequential fluorescence in situ hybridization data, and also reveals significant gene expression gradients and hot spots in low-dimensional projections of dissociated single-cell RNA-seq data.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: trendsceek detects spatial gene expression trends in simulated data.**

**Figure 2: Application of trendsceek to spatial and single-cell gene expression data.**

Statistical analysis of spatial expression patterns for spatially resolved transcriptomic studies

Article 27 January 2020

Shiquan Sun, Jiaqiang Zhu & Xiang Zhou

A universal tool for predicting differentially active features in single-cell and spatial genomics data

Article Open access 22 July 2023

Alexis Vandenbon & Diego Diez

nnSVG for the scalable identification of spatially variable genes using nearest-neighbor Gaussian processes

Article Open access 10 July 2023

Lukas M. Weber, Arkajyoti Saha, … Stephanie C. Hicks

References

Lubeck, E., Coskun, A.F., Zhiyentayev, T., Ahmad, M. & Cai, L. Nat. Methods 11, 360–361 (2014).
Article CAS Google Scholar
Chen, K.H., Boettiger, A.N., Moffitt, J.R., Wang, S. & Zhuang, X. Science 348, aaa6090 (2015).
Article Google Scholar
Shah, S., Lubeck, E., Zhou, W. & Cai, L. Neuron 92, 342–357 (2016).
Article CAS Google Scholar
Ståhl, P.L. et al. Science 353, 78–82 (2016).
Article Google Scholar
Scialdone, A. et al. Nature 535, 289–293 (2016).
Article CAS Google Scholar
van der Maaten, L. & Hinton, G. J. Mach. Learn. Res. 9, 2579–2605 (2008).
Google Scholar
Illian, J., Penttinen, A., Stoyan, H. & Stoyan, D. Modelling and Simulation of Stationary Point Processes. Statistical Analysis and Modelling of Spatial Point Patterns 363–444 (John Wiley & Sons, Ltd, 2008).
Hatami, R. et al. Sci. Transl. Med. 5, 169ra12 (2013).
Article Google Scholar
Singha, P.K., Yeh, I.-T., Venkatachalam, M.A. & Saikumar, P. Cancer Res. 70, 6377–6383 (2010).
Article CAS Google Scholar
Wagner, A., Regev, A. & Yosef, N. Nat. Biotechnol. 34, 1145–1160 (2016).
Article CAS Google Scholar
Brennecke, P. et al. Nat. Methods 10, 1093–1095 (2013).
Article CAS Google Scholar
Trapnell, C. et al. Nat. Biotechnol. 32, 381–386 (2014).
Article CAS Google Scholar
Bendall, S.C. et al. Cell 157, 714–725 (2014).
Article CAS Google Scholar
Haghverdi, L., Büttner, M., Wolf, F.A., Buettner, F. & Theis, F.J. Nat. Methods 13, 845–848 (2016).
Article CAS Google Scholar
Petropoulos, S. et al. Cell 165, 1012–1026 (2016).
Article CAS Google Scholar
Joost, S. et al. Cell Syst. 3, 221–237 (2016).
Article CAS Google Scholar
Stoyan, D. & Stoyan, H. Fractals, Random Shapes, and Point Fields (John Wiley & Sons Inc, 1994).
Schlather, M., Ribeiro, P.J. & Diggle, P.J. J. R. Stat. Soc. Series B Stat. Methodol. 66, 79–93 (2004).
Article Google Scholar
Cressie, N.A.C. Statistics for Spatial Data (John Wiley & Sons, 1991).
Ripley, B.D. Point processes for the earth sciences. in Quantitative Analysis of Mineral and Energy Resources (eds. Chung, C.F., Fabbri, A.G. & Sinding-Larsen, R.) 301–322 (Springer, 1988).
Ohser, J. Statistics 14, 63–71 (1983).
Google Scholar
Kharchenko, P.V., Silberstein, L. & Scadden, D.T. Nat. Methods 11, 740–742 (2014).
CAS PubMed PubMed Central Google Scholar
Love, M.I., Huber, W. & Anders, S. Genome Biol. 15, 550 (2014).
Article Google Scholar

Download references

Acknowledgements

This work was supported by the Swedish Research Council (grant 2017-01062 to R.S.), the European Research Council (grant 648842 to R.S.) and the Bert L. and N. Kuggie Vallee Foundation (R.S.).

Author information

Authors and Affiliations

Department of Cell and Molecular Biology, Karolinska Institutet, Stockholm, Sweden
Daniel Edsgärd, Per Johnsson & Rickard Sandberg
Ludwig Institute for Cancer Research, Stockholm, Sweden
Daniel Edsgärd, Per Johnsson & Rickard Sandberg

Authors

Daniel Edsgärd
View author publications
You can also search for this author in PubMed Google Scholar
Per Johnsson
View author publications
You can also search for this author in PubMed Google Scholar
Rickard Sandberg
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

D.E. conceived the idea, developed the method, performed the analyses and wrote the manuscript. P.J. performed seqFISH and clustering analyses. R.S. supervised the project and wrote the manuscript.

Corresponding author

Correspondence to Rickard Sandberg.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Power analysis of trendsceek.

We estimated the power of trendsceek with respect to the identification of genes with an expression hot-spot (A), step-gradient (B), non-radial streak (C) or linear gradients (D). (A-C) Expression values were empirically sampled from seqFISH data. (D) For linear gradients the effect size = highest expression / lowest expression. Sensitivity (y-axis) indicates the proportion of 100 independently simulated genes that was deemed significant (P ≤ 0.05), where cell-positions were randomized for each gene. Vertical lines indicate 95% bootstrap confidence interval of the mean sensitivity from repeated simulations (Methods). (A-D) Sample sizes of 100, 500 and 1000 cells were evaluated with 0.01, 0.05 and 0.1 fraction of spiked cells (A-C).

Supplementary Figure 2 trendsceek sensitivity depends on relative differences and not magnitude scale.

Sensitivity versus fold-change for three different background expression levels. (A) 10 out of 100 cells were spiked in a hotspot region. (B) Sensitivity was calculated as the fraction of 100 genes that were detectable where cell-positions were randomized for each gene. As the lines follow each other it is indicative of that the power is the same regardless of background expression level.

Supplementary Figure 3 Comparing trendsceek results with the identification of variable genes.

Trendsceek was run on the 500 most variable genes, indicated in yellow. Genes deemed significant by trendsceek (by at least one of the four statistical tests, Benjamini-Hochberg adjusted P < 0.05) are indicated in red. Blue indicates smoothed gene density histogram from all genes. Analyzed datasets were: (A) spatial transcriptomics mouse olfactory bulb (tissue section “replicate 3”, n=269 array-spots), (B) spatial transcriptomics mouse olfactory bulb (tissue section “replicate 12”, n=280 array-spots), (C) spatial transcriptomics breast cancer biopsy (layer 2, n=251 array-spots), and (D) mouse gastrulation single-cell RNA-seq data (n=481 cells). (A-C) The red line illustrates a poisson distribution fit without any overdispersion. (D) The solid red line illustrates a negative binomial fit and the dashed line marks the expected position of genes with biological coefficient of variation = 0.5.

Supplementary Figure 4 Comparison of trendsceek mark-segregation summary statistics.

(A-D) Venn diagrams comparing significant genes identified with different trendsceek summary statistics for four different data sets; spatial transcriptomics mouse olfactory bulb, tissue section “replicate 3” (n=269 array-spots) (A), tissue section “replicate 12” (n=280 array-spots) (B), spatial transcriptomics breast cancer (n=251 array-spots) (C), and mouse gastrulation single-cell data (n=481 cells) (D). (E-H) Pair-wise comparison of summary statistics for 500 most variable genes for the four different data sets (same order and sample sizes as for A-D). Blue indicates smoothed gene density histogram. Numbers in lower left triangle of the matrices indicate Pearson correlation between each pair of summary statistics.

Supplementary Figure 5 trendsceek statistics for significant genes from mouse olfactory bulb spatial transcriptomics data.

Marked point pattern statistics for significant genes (mark-correlation P ≤ 0.05, Benjamini-Hochberg adjusted, two-sided) for (A) spatial transcriptomics mouse olfactory bulb tissue section “replicate 3” (n=269 array-spots) and (B) mouse olfactory bulb tissue section “replicate 12” (n=280 array-spots).

Supplementary Figure 6 Spatial expression patterns identified in spatial transcriptomics mouse olfactory bulb data.

Spatial expression patterns of trendsceek-significant genes (by at least one of the four statistical tests, Benjamini-Hochberg adjusted P < 0.05, two-sided) for spatial-transcriptomics mouse olfactory datasets (A, B) tissue section “replicate 3” (n=269 array-spots) and (C, D) tissue section “replicate 12” (n=280 array-spots). Expression levels were scaled to the range [0-1] for visualisation purposes. Red circles in the density plots (B, D) indicate cells in regions with significantly higher expression (one-sided kernel density estimation, P ≤ 0.05).

Supplementary Figure 7 trendsceek statistics for significant genes from spatial transcriptomics breast cancer data.

Marked point pattern statistics for significant (by at least one of the four statistical tests, Benjamini-Hochberg adjusted P < 0.05, two-sided) for spatial transcriptomics breast cancer data (n=251 array-spots).

Supplementary Figure 8 Spatial expression patterns identified in spatial transcriptomics breast cancer data.

(A) Spatial expression patterns of significant genes (by at least one of the four statistical tests, Benjamini-Hochberg adjusted P < 0.05, two-sided) for spatial-transcriptomics breast-cancer dataset (layer 2, n=251 array-spots). Expression levels were scaled to the range [0-1] for visualisation purposes. Red circles in the density plots (B) indicate cells in regions with significantly higher expression (one-sided kernel density estimation, P ≤ 0.05). (C) Heatmap with genes by row and cells by column, where black indicate that a cell was deemed by trendsceek to be in a region of significantly upregulated expression. Upper bar indicates cell-clusters and left bar gene-clusters (agglomerative hierarchical clustering, Euclidean distance, Ward’s criterion). (D) Spatial layout of the cell- and gene-clusters identified in (C).

Supplementary Figure 9 trendsceek statistics for significant genes from mouse gastrulation single-cell RNA-seq data.

Marked point pattern statistics for significant genes (by at least one of the four statistical tests, Benjamini-Hochberg adjusted P < 0.05, two-sided) for mouse gastrulation single-cell RNA-seq data (n=481 cells).

Supplementary Figure 10 Spatial expression patterns identified in mouse gastrulation single-cell RNA-seq data.

Spatial expression patterns of trendsceek-significant genes (by at least one of the four statistical tests, Benjamini-Hochberg adjusted P < 0.05, two-sided) for mouse gastrulation single-cell RNA-seq dataset (n=481 cells). Expression levels were scaled to the range [0-1] for visualisation purposes. (B) Red circles in the density plots indicate cells in regions with significantly higher expression (one-sided kernel density estimation, P ≤ 0.05).

Supplementary Figure 11 trendsceek statistics for selected significant genes from mouse hippocampus seqFISH data.

Marked point pattern statistics for selected significant genes from seqFISH data from mouse hippocampus regions (Benjamini-Hochberg adjusted P< 0.01 for at least one of the four statistic tests, two-sided). Scatter- and density-plots for the same genes are shown in Supplementary Figure 12. Cell positions were scaled to [0-1].

Supplementary Figure 12 Spatial gene expression patterns found in regions of mouse hippocampus from seqFISH data.

(A) Cartoon of hippocampus with the 21 imaged regions labeled. (B) Visualization of the 15 hippocampus regions where significant spatial gene expression patterns were identified with trendsceek (Benjamini-Hochberg adjusted P< 0.01 for at least one of the four statistic tests, two-sided). Colors indicate cell-clusters identified by clustering the binary cell-detection matrix output by trendsceek. The number after the image region label represents the number of cells. (C) Representative gene examples of spatial gene expression patterns from each region. Expression was scaled to the range zero to one by unity-based normalization. Cells are colored red if they exceeded a 5% significance level (based on wKDE, indicated by the blue surface, one-sided). The number after the image region label represents the image field number provided in the data source.

Supplementary Figure 13 Comparison of genes identified by trendsceek with genes from a differential expression analysis using cell cluster information.

Differential expression analysis was performed for seven different spatial patterns, where cell groups were derived by trendsceek from genes representing the pattern (Methods). (A) Fold-change versus SCDE p-value (two-sided test using the multiple testing corrected (Holm procedure) Z-score) for all genes deemed significant by SCDE among the same set of 500 genes that was input to trendsceek (P ≤ 0.05, Benjamini-Hochberg adjusted, fold-change ≥ 1.5). Red points indicate genes identified by trendsceek. (B) Number of significant genes for SCDE and trendsceek for the seven spatial patterns (see Figure 2 for the gene patterns). Percentage indicates fraction of SCDE-significant genes also identified by trendsceek.

Supplementary Figure 14 Comparison of gene rankings between trendsceek and gene variability.

The ranking of genes identified by trendsceek (mark-correlation ranks) was compared to spatially unaware gene variability rankings (Methods) for spatial transcriptomics mouse olfactory bulb, tissue section “replicate 3” (n=35 genes) (A), tissue section “replicate 12” (n=45 genes) (B), spatial transcriptomics breast cancer layer 2 (n=15 genes) (C) and mouse gastrulation single-cell RNA-seq data (n=107 genes) (D). Horizontal lines indicate the top 50 ranked variable genes.

Supplementary Figure 15 Runtime of trendsceek on different data sets.

(A) Runtime against number of most variable genes in mouse olfactory bulb dataset replicate 12 (n=280 array-spots). (B) Runtime for the 500 most variable genes in each of three datasets, spatial transcriptomics data from mouse olfactory bulb replicate 12 (ST-MOB, n=280 array-spots), breast cancer layer 2 (ST-BC, n=251 array-spots) and scRNA-seq mouse epiblast cells (SC-MEPI, n=481 cells). The dashed line indicates a quadratic function. Trendsceek is implemented to be run in parallel by specifying an “ncores” argument and each run was done using 70 cores (Intel Xeon 2.3GHz).

Source data

Source data to Fig. 1

Rights and permissions

Reprints and permissions

About this article

Cite this article

Edsgärd, D., Johnsson, P. & Sandberg, R. Identification of spatial expression trends in single-cell gene expression data. Nat Methods 15, 339–342 (2018). https://doi.org/10.1038/nmeth.4634

Download citation

Received: 15 June 2017
Accepted: 20 February 2018
Published: 19 March 2018
Issue Date: 01 May 2018
DOI: https://doi.org/10.1038/nmeth.4634

This article is cited by

Niche-DE: niche-differential gene expression analysis in spatial transcriptomics data identifies context-dependent cell-cell interactions
- Kaishu Mason
- Anuja Sathe
- Nancy Zhang
Genome Biology (2024)
Bioinformatics in urology — molecular characterization of pathophysiology and response to treatment
- Ali Hashemi Gheinani
- Jina Kim
- Rosalyn M. Adam
Nature Reviews Urology (2024)
PROST: quantitative identification of spatially variable genes and domain detection in spatial transcriptomics
- Yuchen Liang
- Guowei Shi
- Zhonghui Tang
Nature Communications (2024)
SRTsim: spatial pattern preserving simulations for spatially resolved transcriptomics
- Jiaqiang Zhu
- Lulu Shang
- Xiang Zhou
Genome Biology (2023)
Advances in spatial transcriptomics and related data analysis strategies
- Jun Du
- Yu-Chen Yang
- Jian Hou
Journal of Translational Medicine (2023)