Identification of spatial expression trends in single-cell gene expression data

Abstract

As methods for measuring spatial gene expression at single-cell resolution become available, there is a need for computational analysis strategies. We present trendsceek, a method based on marked point processes that identifies genes with statistically significant spatial expression trends. trendsceek finds these genes in spatial transcriptomic and sequential fluorescence in situ hybridization data, and also reveals significant gene expression gradients and hot spots in low-dimensional projections of dissociated single-cell RNA-seq data.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: trendsceek detects spatial gene expression trends in simulated data.
Figure 2: Application of trendsceek to spatial and single-cell gene expression data.

References

  1. 1

    Lubeck, E., Coskun, A.F., Zhiyentayev, T., Ahmad, M. & Cai, L. Nat. Methods 11, 360–361 (2014).

    CAS  Article  Google Scholar 

  2. 2

    Chen, K.H., Boettiger, A.N., Moffitt, J.R., Wang, S. & Zhuang, X. Science 348, aaa6090 (2015).

    Article  Google Scholar 

  3. 3

    Shah, S., Lubeck, E., Zhou, W. & Cai, L. Neuron 92, 342–357 (2016).

    CAS  Article  Google Scholar 

  4. 4

    Ståhl, P.L. et al. Science 353, 78–82 (2016).

    Article  Google Scholar 

  5. 5

    Scialdone, A. et al. Nature 535, 289–293 (2016).

    CAS  Article  Google Scholar 

  6. 6

    van der Maaten, L. & Hinton, G. J. Mach. Learn. Res. 9, 2579–2605 (2008).

    Google Scholar 

  7. 7

    Illian, J., Penttinen, A., Stoyan, H. & Stoyan, D. Modelling and Simulation of Stationary Point Processes. Statistical Analysis and Modelling of Spatial Point Patterns 363–444 (John Wiley & Sons, Ltd, 2008).

  8. 8

    Hatami, R. et al. Sci. Transl. Med. 5, 169ra12 (2013).

    Article  Google Scholar 

  9. 9

    Singha, P.K., Yeh, I.-T., Venkatachalam, M.A. & Saikumar, P. Cancer Res. 70, 6377–6383 (2010).

    CAS  Article  Google Scholar 

  10. 10

    Wagner, A., Regev, A. & Yosef, N. Nat. Biotechnol. 34, 1145–1160 (2016).

    CAS  Article  Google Scholar 

  11. 11

    Brennecke, P. et al. Nat. Methods 10, 1093–1095 (2013).

    CAS  Article  Google Scholar 

  12. 12

    Trapnell, C. et al. Nat. Biotechnol. 32, 381–386 (2014).

    CAS  Article  Google Scholar 

  13. 13

    Bendall, S.C. et al. Cell 157, 714–725 (2014).

    CAS  Article  Google Scholar 

  14. 14

    Haghverdi, L., Büttner, M., Wolf, F.A., Buettner, F. & Theis, F.J. Nat. Methods 13, 845–848 (2016).

    CAS  Article  Google Scholar 

  15. 15

    Petropoulos, S. et al. Cell 165, 1012–1026 (2016).

    CAS  Article  Google Scholar 

  16. 16

    Joost, S. et al. Cell Syst. 3, 221–237 (2016).

    CAS  Article  Google Scholar 

  17. 17

    Stoyan, D. & Stoyan, H. Fractals, Random Shapes, and Point Fields (John Wiley & Sons Inc, 1994).

  18. 18

    Schlather, M., Ribeiro, P.J. & Diggle, P.J. J. R. Stat. Soc. Series B Stat. Methodol. 66, 79–93 (2004).

    Article  Google Scholar 

  19. 19

    Cressie, N.A.C. Statistics for Spatial Data (John Wiley & Sons, 1991).

  20. 20

    Ripley, B.D. Point processes for the earth sciences. in Quantitative Analysis of Mineral and Energy Resources (eds. Chung, C.F., Fabbri, A.G. & Sinding-Larsen, R.) 301–322 (Springer, 1988).

  21. 21

    Ohser, J. Statistics 14, 63–71 (1983).

    Google Scholar 

  22. 22

    Kharchenko, P.V., Silberstein, L. & Scadden, D.T. Nat. Methods 11, 740–742 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  23. 23

    Love, M.I., Huber, W. & Anders, S. Genome Biol. 15, 550 (2014).

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the Swedish Research Council (grant 2017-01062 to R.S.), the European Research Council (grant 648842 to R.S.) and the Bert L. and N. Kuggie Vallee Foundation (R.S.).

Author information

Affiliations

Authors

Contributions

D.E. conceived the idea, developed the method, performed the analyses and wrote the manuscript. P.J. performed seqFISH and clustering analyses. R.S. supervised the project and wrote the manuscript.

Corresponding author

Correspondence to Rickard Sandberg.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Power analysis of trendsceek.

We estimated the power of trendsceek with respect to the identification of genes with an expression hot-spot (A), step-gradient (B), non-radial streak (C) or linear gradients (D). (A-C) Expression values were empirically sampled from seqFISH data. (D) For linear gradients the effect size = highest expression / lowest expression. Sensitivity (y-axis) indicates the proportion of 100 independently simulated genes that was deemed significant (P ≤ 0.05), where cell-positions were randomized for each gene. Vertical lines indicate 95% bootstrap confidence interval of the mean sensitivity from repeated simulations (Methods). (A-D) Sample sizes of 100, 500 and 1000 cells were evaluated with 0.01, 0.05 and 0.1 fraction of spiked cells (A-C).

Supplementary Figure 2 trendsceek sensitivity depends on relative differences and not magnitude scale.

Sensitivity versus fold-change for three different background expression levels. (A) 10 out of 100 cells were spiked in a hotspot region. (B) Sensitivity was calculated as the fraction of 100 genes that were detectable where cell-positions were randomized for each gene. As the lines follow each other it is indicative of that the power is the same regardless of background expression level.

Supplementary Figure 3 Comparing trendsceek results with the identification of variable genes.

Trendsceek was run on the 500 most variable genes, indicated in yellow. Genes deemed significant by trendsceek (by at least one of the four statistical tests, Benjamini-Hochberg adjusted P < 0.05) are indicated in red. Blue indicates smoothed gene density histogram from all genes. Analyzed datasets were: (A) spatial transcriptomics mouse olfactory bulb (tissue section “replicate 3”, n=269 array-spots), (B) spatial transcriptomics mouse olfactory bulb (tissue section “replicate 12”, n=280 array-spots), (C) spatial transcriptomics breast cancer biopsy (layer 2, n=251 array-spots), and (D) mouse gastrulation single-cell RNA-seq data (n=481 cells). (A-C) The red line illustrates a poisson distribution fit without any overdispersion. (D) The solid red line illustrates a negative binomial fit and the dashed line marks the expected position of genes with biological coefficient of variation = 0.5.

Supplementary Figure 4 Comparison of trendsceek mark-segregation summary statistics.

(A-D) Venn diagrams comparing significant genes identified with different trendsceek summary statistics for four different data sets; spatial transcriptomics mouse olfactory bulb, tissue section “replicate 3” (n=269 array-spots) (A), tissue section “replicate 12” (n=280 array-spots) (B), spatial transcriptomics breast cancer (n=251 array-spots) (C), and mouse gastrulation single-cell data (n=481 cells) (D). (E-H) Pair-wise comparison of summary statistics for 500 most variable genes for the four different data sets (same order and sample sizes as for A-D). Blue indicates smoothed gene density histogram. Numbers in lower left triangle of the matrices indicate Pearson correlation between each pair of summary statistics.

Supplementary Figure 5 trendsceek statistics for significant genes from mouse olfactory bulb spatial transcriptomics data.

Marked point pattern statistics for significant genes (mark-correlation P ≤ 0.05, Benjamini-Hochberg adjusted, two-sided) for (A) spatial transcriptomics mouse olfactory bulb tissue section “replicate 3” (n=269 array-spots) and (B) mouse olfactory bulb tissue section “replicate 12” (n=280 array-spots).

Supplementary Figure 6 Spatial expression patterns identified in spatial transcriptomics mouse olfactory bulb data.

Spatial expression patterns of trendsceek-significant genes (by at least one of the four statistical tests, Benjamini-Hochberg adjusted P < 0.05, two-sided) for spatial-transcriptomics mouse olfactory datasets (A, B) tissue section “replicate 3” (n=269 array-spots) and (C, D) tissue section “replicate 12” (n=280 array-spots). Expression levels were scaled to the range [0-1] for visualisation purposes. Red circles in the density plots (B, D) indicate cells in regions with significantly higher expression (one-sided kernel density estimation, P ≤ 0.05).

Supplementary Figure 7 trendsceek statistics for significant genes from spatial transcriptomics breast cancer data.

Marked point pattern statistics for significant (by at least one of the four statistical tests, Benjamini-Hochberg adjusted P < 0.05, two-sided) for spatial transcriptomics breast cancer data (n=251 array-spots).

Supplementary Figure 8 Spatial expression patterns identified in spatial transcriptomics breast cancer data.

(A) Spatial expression patterns of significant genes (by at least one of the four statistical tests, Benjamini-Hochberg adjusted P < 0.05, two-sided) for spatial-transcriptomics breast-cancer dataset (layer 2, n=251 array-spots). Expression levels were scaled to the range [0-1] for visualisation purposes. Red circles in the density plots (B) indicate cells in regions with significantly higher expression (one-sided kernel density estimation, P ≤ 0.05). (C) Heatmap with genes by row and cells by column, where black indicate that a cell was deemed by trendsceek to be in a region of significantly upregulated expression. Upper bar indicates cell-clusters and left bar gene-clusters (agglomerative hierarchical clustering, Euclidean distance, Ward’s criterion). (D) Spatial layout of the cell- and gene-clusters identified in (C).

Supplementary Figure 9 trendsceek statistics for significant genes from mouse gastrulation single-cell RNA-seq data.

Marked point pattern statistics for significant genes (by at least one of the four statistical tests, Benjamini-Hochberg adjusted P < 0.05, two-sided) for mouse gastrulation single-cell RNA-seq data (n=481 cells).

Supplementary Figure 10 Spatial expression patterns identified in mouse gastrulation single-cell RNA-seq data.

Spatial expression patterns of trendsceek-significant genes (by at least one of the four statistical tests, Benjamini-Hochberg adjusted P < 0.05, two-sided) for mouse gastrulation single-cell RNA-seq dataset (n=481 cells). Expression levels were scaled to the range [0-1] for visualisation purposes. (B) Red circles in the density plots indicate cells in regions with significantly higher expression (one-sided kernel density estimation, P ≤ 0.05).

Supplementary Figure 11 trendsceek statistics for selected significant genes from mouse hippocampus seqFISH data.

Marked point pattern statistics for selected significant genes from seqFISH data from mouse hippocampus regions (Benjamini-Hochberg adjusted P< 0.01 for at least one of the four statistic tests, two-sided). Scatter- and density-plots for the same genes are shown in Supplementary Figure 12. Cell positions were scaled to [0-1].

Supplementary Figure 12 Spatial gene expression patterns found in regions of mouse hippocampus from seqFISH data.

(A) Cartoon of hippocampus with the 21 imaged regions labeled. (B) Visualization of the 15 hippocampus regions where significant spatial gene expression patterns were identified with trendsceek (Benjamini-Hochberg adjusted P< 0.01 for at least one of the four statistic tests, two-sided). Colors indicate cell-clusters identified by clustering the binary cell-detection matrix output by trendsceek. The number after the image region label represents the number of cells. (C) Representative gene examples of spatial gene expression patterns from each region. Expression was scaled to the range zero to one by unity-based normalization. Cells are colored red if they exceeded a 5% significance level (based on wKDE, indicated by the blue surface, one-sided). The number after the image region label represents the image field number provided in the data source.

Supplementary Figure 13 Comparison of genes identified by trendsceek with genes from a differential expression analysis using cell cluster information.

Differential expression analysis was performed for seven different spatial patterns, where cell groups were derived by trendsceek from genes representing the pattern (Methods). (A) Fold-change versus SCDE p-value (two-sided test using the multiple testing corrected (Holm procedure) Z-score) for all genes deemed significant by SCDE among the same set of 500 genes that was input to trendsceek (P ≤ 0.05, Benjamini-Hochberg adjusted, fold-change ≥ 1.5). Red points indicate genes identified by trendsceek. (B) Number of significant genes for SCDE and trendsceek for the seven spatial patterns (see Figure 2 for the gene patterns). Percentage indicates fraction of SCDE-significant genes also identified by trendsceek.

Supplementary Figure 14 Comparison of gene rankings between trendsceek and gene variability.

The ranking of genes identified by trendsceek (mark-correlation ranks) was compared to spatially unaware gene variability rankings (Methods) for spatial transcriptomics mouse olfactory bulb, tissue section “replicate 3” (n=35 genes) (A), tissue section “replicate 12” (n=45 genes) (B), spatial transcriptomics breast cancer layer 2 (n=15 genes) (C) and mouse gastrulation single-cell RNA-seq data (n=107 genes) (D). Horizontal lines indicate the top 50 ranked variable genes.

Supplementary Figure 15 Runtime of trendsceek on different data sets.

(A) Runtime against number of most variable genes in mouse olfactory bulb dataset replicate 12 (n=280 array-spots). (B) Runtime for the 500 most variable genes in each of three datasets, spatial transcriptomics data from mouse olfactory bulb replicate 12 (ST-MOB, n=280 array-spots), breast cancer layer 2 (ST-BC, n=251 array-spots) and scRNA-seq mouse epiblast cells (SC-MEPI, n=481 cells). The dashed line indicates a quadratic function. Trendsceek is implemented to be run in parallel by specifying an “ncores” argument and each run was done using 70 cores (Intel Xeon 2.3GHz).

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–15

Life Sciences Reporting Summary

Supplementary Table 1

Number of significant genes identified with the different spatial trend metrics

Supplementary Table 2

List of genes with significant spatial expression patterns in spatial transcriptomics data from mouse olfactory bulb (replicate 3)

Supplementary Table 3

List of cells with significant spatial expression patterns in spatial transcriptomics data from mouse olfactory bulb (replicate 3)

Supplementary Table 4

List of genes with significant spatial expression patterns in spatial transcriptomics data from mouse olfactory bulb (replicate 12)

Supplementary Table 5

List of cells with significant spatial expression patterns in spatial transcriptomics data from mouse olfactory bulb (replicate 12)

Supplementary Table 6

List of genes with significant spatial expression patterns in spatial transcriptomics data from breast cancer (layer 2)

Supplementary Table 7

List of cells with significant spatial expression patterns in spatial transcriptomics data from breast cancer (layer 2)

Supplementary Table 8

List of genes with significant spatial expression patterns in dissociated single-cell RNA-seq data from mouse embryo

Supplementary Table 9

List of cells with significant spatial expression patterns in dissociated single-cell RNA-seq data from mouse embryo

Supplementary Table 10

List of genes with significant spatial expression patterns in seqFISH data from mouse hippocampus

Supplementary Table 11

List of cells with significant spatial expression patterns in seqFISH data from mouse hippocampus

Supplementary Software

trendsceek package

Source data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Edsgärd, D., Johnsson, P. & Sandberg, R. Identification of spatial expression trends in single-cell gene expression data. Nat Methods 15, 339–342 (2018). https://doi.org/10.1038/nmeth.4634

Download citation

Further reading

Search

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing