Abstract
As methods for measuring spatial gene expression at single-cell resolution become available, there is a need for computational analysis strategies. We present trendsceek, a method based on marked point processes that identifies genes with statistically significant spatial expression trends. trendsceek finds these genes in spatial transcriptomic and sequential fluorescence in situ hybridization data, and also reveals significant gene expression gradients and hot spots in low-dimensional projections of dissociated single-cell RNA-seq data.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Lubeck, E., Coskun, A.F., Zhiyentayev, T., Ahmad, M. & Cai, L. Nat. Methods 11, 360–361 (2014).
Chen, K.H., Boettiger, A.N., Moffitt, J.R., Wang, S. & Zhuang, X. Science 348, aaa6090 (2015).
Shah, S., Lubeck, E., Zhou, W. & Cai, L. Neuron 92, 342–357 (2016).
Ståhl, P.L. et al. Science 353, 78–82 (2016).
Scialdone, A. et al. Nature 535, 289–293 (2016).
van der Maaten, L. & Hinton, G. J. Mach. Learn. Res. 9, 2579–2605 (2008).
Illian, J., Penttinen, A., Stoyan, H. & Stoyan, D. Modelling and Simulation of Stationary Point Processes. Statistical Analysis and Modelling of Spatial Point Patterns 363–444 (John Wiley & Sons, Ltd, 2008).
Hatami, R. et al. Sci. Transl. Med. 5, 169ra12 (2013).
Singha, P.K., Yeh, I.-T., Venkatachalam, M.A. & Saikumar, P. Cancer Res. 70, 6377–6383 (2010).
Wagner, A., Regev, A. & Yosef, N. Nat. Biotechnol. 34, 1145–1160 (2016).
Brennecke, P. et al. Nat. Methods 10, 1093–1095 (2013).
Trapnell, C. et al. Nat. Biotechnol. 32, 381–386 (2014).
Bendall, S.C. et al. Cell 157, 714–725 (2014).
Haghverdi, L., Büttner, M., Wolf, F.A., Buettner, F. & Theis, F.J. Nat. Methods 13, 845–848 (2016).
Petropoulos, S. et al. Cell 165, 1012–1026 (2016).
Joost, S. et al. Cell Syst. 3, 221–237 (2016).
Stoyan, D. & Stoyan, H. Fractals, Random Shapes, and Point Fields (John Wiley & Sons Inc, 1994).
Schlather, M., Ribeiro, P.J. & Diggle, P.J. J. R. Stat. Soc. Series B Stat. Methodol. 66, 79–93 (2004).
Cressie, N.A.C. Statistics for Spatial Data (John Wiley & Sons, 1991).
Ripley, B.D. Point processes for the earth sciences. in Quantitative Analysis of Mineral and Energy Resources (eds. Chung, C.F., Fabbri, A.G. & Sinding-Larsen, R.) 301–322 (Springer, 1988).
Ohser, J. Statistics 14, 63–71 (1983).
Kharchenko, P.V., Silberstein, L. & Scadden, D.T. Nat. Methods 11, 740–742 (2014).
Love, M.I., Huber, W. & Anders, S. Genome Biol. 15, 550 (2014).
Acknowledgements
This work was supported by the Swedish Research Council (grant 2017-01062 to R.S.), the European Research Council (grant 648842 to R.S.) and the Bert L. and N. Kuggie Vallee Foundation (R.S.).
Author information
Authors and Affiliations
Contributions
D.E. conceived the idea, developed the method, performed the analyses and wrote the manuscript. P.J. performed seqFISH and clustering analyses. R.S. supervised the project and wrote the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 Power analysis of trendsceek.
We estimated the power of trendsceek with respect to the identification of genes with an expression hot-spot (A), step-gradient (B), non-radial streak (C) or linear gradients (D). (A-C) Expression values were empirically sampled from seqFISH data. (D) For linear gradients the effect size = highest expression / lowest expression. Sensitivity (y-axis) indicates the proportion of 100 independently simulated genes that was deemed significant (P ≤ 0.05), where cell-positions were randomized for each gene. Vertical lines indicate 95% bootstrap confidence interval of the mean sensitivity from repeated simulations (Methods). (A-D) Sample sizes of 100, 500 and 1000 cells were evaluated with 0.01, 0.05 and 0.1 fraction of spiked cells (A-C).
Supplementary Figure 2 trendsceek sensitivity depends on relative differences and not magnitude scale.
Sensitivity versus fold-change for three different background expression levels. (A) 10 out of 100 cells were spiked in a hotspot region. (B) Sensitivity was calculated as the fraction of 100 genes that were detectable where cell-positions were randomized for each gene. As the lines follow each other it is indicative of that the power is the same regardless of background expression level.
Supplementary Figure 3 Comparing trendsceek results with the identification of variable genes.
Trendsceek was run on the 500 most variable genes, indicated in yellow. Genes deemed significant by trendsceek (by at least one of the four statistical tests, Benjamini-Hochberg adjusted P < 0.05) are indicated in red. Blue indicates smoothed gene density histogram from all genes. Analyzed datasets were: (A) spatial transcriptomics mouse olfactory bulb (tissue section “replicate 3”, n=269 array-spots), (B) spatial transcriptomics mouse olfactory bulb (tissue section “replicate 12”, n=280 array-spots), (C) spatial transcriptomics breast cancer biopsy (layer 2, n=251 array-spots), and (D) mouse gastrulation single-cell RNA-seq data (n=481 cells). (A-C) The red line illustrates a poisson distribution fit without any overdispersion. (D) The solid red line illustrates a negative binomial fit and the dashed line marks the expected position of genes with biological coefficient of variation = 0.5.
Supplementary Figure 4 Comparison of trendsceek mark-segregation summary statistics.
(A-D) Venn diagrams comparing significant genes identified with different trendsceek summary statistics for four different data sets; spatial transcriptomics mouse olfactory bulb, tissue section “replicate 3” (n=269 array-spots) (A), tissue section “replicate 12” (n=280 array-spots) (B), spatial transcriptomics breast cancer (n=251 array-spots) (C), and mouse gastrulation single-cell data (n=481 cells) (D). (E-H) Pair-wise comparison of summary statistics for 500 most variable genes for the four different data sets (same order and sample sizes as for A-D). Blue indicates smoothed gene density histogram. Numbers in lower left triangle of the matrices indicate Pearson correlation between each pair of summary statistics.
Supplementary Figure 5 trendsceek statistics for significant genes from mouse olfactory bulb spatial transcriptomics data.
Marked point pattern statistics for significant genes (mark-correlation P ≤ 0.05, Benjamini-Hochberg adjusted, two-sided) for (A) spatial transcriptomics mouse olfactory bulb tissue section “replicate 3” (n=269 array-spots) and (B) mouse olfactory bulb tissue section “replicate 12” (n=280 array-spots).
Supplementary Figure 6 Spatial expression patterns identified in spatial transcriptomics mouse olfactory bulb data.
Spatial expression patterns of trendsceek-significant genes (by at least one of the four statistical tests, Benjamini-Hochberg adjusted P < 0.05, two-sided) for spatial-transcriptomics mouse olfactory datasets (A, B) tissue section “replicate 3” (n=269 array-spots) and (C, D) tissue section “replicate 12” (n=280 array-spots). Expression levels were scaled to the range [0-1] for visualisation purposes. Red circles in the density plots (B, D) indicate cells in regions with significantly higher expression (one-sided kernel density estimation, P ≤ 0.05).
Supplementary Figure 7 trendsceek statistics for significant genes from spatial transcriptomics breast cancer data.
Marked point pattern statistics for significant (by at least one of the four statistical tests, Benjamini-Hochberg adjusted P < 0.05, two-sided) for spatial transcriptomics breast cancer data (n=251 array-spots).
Supplementary Figure 8 Spatial expression patterns identified in spatial transcriptomics breast cancer data.
(A) Spatial expression patterns of significant genes (by at least one of the four statistical tests, Benjamini-Hochberg adjusted P < 0.05, two-sided) for spatial-transcriptomics breast-cancer dataset (layer 2, n=251 array-spots). Expression levels were scaled to the range [0-1] for visualisation purposes. Red circles in the density plots (B) indicate cells in regions with significantly higher expression (one-sided kernel density estimation, P ≤ 0.05). (C) Heatmap with genes by row and cells by column, where black indicate that a cell was deemed by trendsceek to be in a region of significantly upregulated expression. Upper bar indicates cell-clusters and left bar gene-clusters (agglomerative hierarchical clustering, Euclidean distance, Ward’s criterion). (D) Spatial layout of the cell- and gene-clusters identified in (C).
Supplementary Figure 9 trendsceek statistics for significant genes from mouse gastrulation single-cell RNA-seq data.
Marked point pattern statistics for significant genes (by at least one of the four statistical tests, Benjamini-Hochberg adjusted P < 0.05, two-sided) for mouse gastrulation single-cell RNA-seq data (n=481 cells).
Supplementary Figure 10 Spatial expression patterns identified in mouse gastrulation single-cell RNA-seq data.
Spatial expression patterns of trendsceek-significant genes (by at least one of the four statistical tests, Benjamini-Hochberg adjusted P < 0.05, two-sided) for mouse gastrulation single-cell RNA-seq dataset (n=481 cells). Expression levels were scaled to the range [0-1] for visualisation purposes. (B) Red circles in the density plots indicate cells in regions with significantly higher expression (one-sided kernel density estimation, P ≤ 0.05).
Supplementary Figure 11 trendsceek statistics for selected significant genes from mouse hippocampus seqFISH data.
Marked point pattern statistics for selected significant genes from seqFISH data from mouse hippocampus regions (Benjamini-Hochberg adjusted P< 0.01 for at least one of the four statistic tests, two-sided). Scatter- and density-plots for the same genes are shown in Supplementary Figure 12. Cell positions were scaled to [0-1].
Supplementary Figure 12 Spatial gene expression patterns found in regions of mouse hippocampus from seqFISH data.
(A) Cartoon of hippocampus with the 21 imaged regions labeled. (B) Visualization of the 15 hippocampus regions where significant spatial gene expression patterns were identified with trendsceek (Benjamini-Hochberg adjusted P< 0.01 for at least one of the four statistic tests, two-sided). Colors indicate cell-clusters identified by clustering the binary cell-detection matrix output by trendsceek. The number after the image region label represents the number of cells. (C) Representative gene examples of spatial gene expression patterns from each region. Expression was scaled to the range zero to one by unity-based normalization. Cells are colored red if they exceeded a 5% significance level (based on wKDE, indicated by the blue surface, one-sided). The number after the image region label represents the image field number provided in the data source.
Supplementary Figure 13 Comparison of genes identified by trendsceek with genes from a differential expression analysis using cell cluster information.
Differential expression analysis was performed for seven different spatial patterns, where cell groups were derived by trendsceek from genes representing the pattern (Methods). (A) Fold-change versus SCDE p-value (two-sided test using the multiple testing corrected (Holm procedure) Z-score) for all genes deemed significant by SCDE among the same set of 500 genes that was input to trendsceek (P ≤ 0.05, Benjamini-Hochberg adjusted, fold-change ≥ 1.5). Red points indicate genes identified by trendsceek. (B) Number of significant genes for SCDE and trendsceek for the seven spatial patterns (see Figure 2 for the gene patterns). Percentage indicates fraction of SCDE-significant genes also identified by trendsceek.
Supplementary Figure 14 Comparison of gene rankings between trendsceek and gene variability.
The ranking of genes identified by trendsceek (mark-correlation ranks) was compared to spatially unaware gene variability rankings (Methods) for spatial transcriptomics mouse olfactory bulb, tissue section “replicate 3” (n=35 genes) (A), tissue section “replicate 12” (n=45 genes) (B), spatial transcriptomics breast cancer layer 2 (n=15 genes) (C) and mouse gastrulation single-cell RNA-seq data (n=107 genes) (D). Horizontal lines indicate the top 50 ranked variable genes.
Supplementary Figure 15 Runtime of trendsceek on different data sets.
(A) Runtime against number of most variable genes in mouse olfactory bulb dataset replicate 12 (n=280 array-spots). (B) Runtime for the 500 most variable genes in each of three datasets, spatial transcriptomics data from mouse olfactory bulb replicate 12 (ST-MOB, n=280 array-spots), breast cancer layer 2 (ST-BC, n=251 array-spots) and scRNA-seq mouse epiblast cells (SC-MEPI, n=481 cells). The dashed line indicates a quadratic function. Trendsceek is implemented to be run in parallel by specifying an “ncores” argument and each run was done using 70 cores (Intel Xeon 2.3GHz).
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–15
Supplementary Table 1
Number of significant genes identified with the different spatial trend metrics
Supplementary Table 2
List of genes with significant spatial expression patterns in spatial transcriptomics data from mouse olfactory bulb (replicate 3)
Supplementary Table 3
List of cells with significant spatial expression patterns in spatial transcriptomics data from mouse olfactory bulb (replicate 3)
Supplementary Table 4
List of genes with significant spatial expression patterns in spatial transcriptomics data from mouse olfactory bulb (replicate 12)
Supplementary Table 5
List of cells with significant spatial expression patterns in spatial transcriptomics data from mouse olfactory bulb (replicate 12)
Supplementary Table 6
List of genes with significant spatial expression patterns in spatial transcriptomics data from breast cancer (layer 2)
Supplementary Table 7
List of cells with significant spatial expression patterns in spatial transcriptomics data from breast cancer (layer 2)
Supplementary Table 8
List of genes with significant spatial expression patterns in dissociated single-cell RNA-seq data from mouse embryo
Supplementary Table 9
List of cells with significant spatial expression patterns in dissociated single-cell RNA-seq data from mouse embryo
Supplementary Table 10
List of genes with significant spatial expression patterns in seqFISH data from mouse hippocampus
Supplementary Table 11
List of cells with significant spatial expression patterns in seqFISH data from mouse hippocampus
Supplementary Software
trendsceek package
Source data
Rights and permissions
About this article
Cite this article
Edsgärd, D., Johnsson, P. & Sandberg, R. Identification of spatial expression trends in single-cell gene expression data. Nat Methods 15, 339–342 (2018). https://doi.org/10.1038/nmeth.4634
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nmeth.4634
This article is cited by
-
Niche-DE: niche-differential gene expression analysis in spatial transcriptomics data identifies context-dependent cell-cell interactions
Genome Biology (2024)
-
A unified model-based framework for doublet or multiplet detection in single-cell multiomics data
Nature Communications (2024)
-
Single-cell omics: experimental workflow, data analyses and applications
Science China Life Sciences (2024)
-
Spatial transcriptomics: recent developments and insights in respiratory research
Military Medical Research (2023)
-
Decoding the tumor microenvironment with spatial technologies
Nature Immunology (2023)