Spatial cellular architecture predicts prognosis in glioblastoma

Zheng, Yuanning; Carrillo-Perez, Francisco; Pizurica, Marija; Heiland, Dieter Henrik; Gevaert, Olivier

doi:10.1038/s41467-023-39933-0

Download PDF

Article
Open access
Published: 11 July 2023

Spatial cellular architecture predicts prognosis in glioblastoma

Nature Communications volume 14, Article number: 4122 (2023) Cite this article

16k Accesses
8 Citations
74 Altmetric
Metrics details

Subjects

Abstract

Intra-tumoral heterogeneity and cell-state plasticity are key drivers for the therapeutic resistance of glioblastoma. Here, we investigate the association between spatial cellular organization and glioblastoma prognosis. Leveraging single-cell RNA-seq and spatial transcriptomics data, we develop a deep learning model to predict transcriptional subtypes of glioblastoma cells from histology images. Employing this model, we phenotypically analyze 40 million tissue spots from 410 patients and identify consistent associations between tumor architecture and prognosis across two independent cohorts. Patients with poor prognosis exhibit higher proportions of tumor cells expressing a hypoxia-induced transcriptional program. Furthermore, a clustering pattern of astrocyte-like tumor cells is associated with worse prognosis, while dispersion and connection of the astrocytes with other transcriptional subtypes correlate with decreased risk. To validate these results, we develop a separate deep learning model that utilizes histology images to predict prognosis. Applying this model to spatial transcriptomics data reveal survival-associated regional gene expression programs. Overall, our study presents a scalable approach to unravel the transcriptional heterogeneity of glioblastoma and establishes a critical connection between spatial cellular architecture and clinical outcomes.

3D genomic mapping reveals multifocality of human pancreatic precancers

Article 01 May 2024

GAGE-seq concurrently profiles multiscale 3D genome organization and gene expression in single cells

Article 14 May 2024

A high-resolution transcriptomic and spatial atlas of cell types in the whole mouse brain

Article Open access 13 December 2023

Introduction

Glioblastoma (GBM) represents the most common and aggressive form of malignant tumor in the central nervous system, characterized by a low five-year survival rate of 6.8%¹. Despite advancements in diagnostic techniques and treatment modalities, therapeutic resistance and tumor recurrence continue to challenge clinical outcomes². One of the major obstacles precluding the development of effective therapeutics is tumor heterogeneity^3,4. Malignant cells demonstrate differences in genetic lesions, epigenetic states, and gene expression profiles^5,6. In addition, tumors from different patients have distinct cell-type compositions and spatial cellular organization. In the past decade, studies based on single-cell RNA sequencing (scRNA-seq) have guided our understanding of intra-tumoral heterogeneity^{5,7,8,9,10,11}. GBM cells span between four major cellular states: (1) neural-progenitor-like (NPC-like), (2) oligodendrocyte-progenitor-like (OPC-like), (3) astrocyte-like (AC-like), and (4) mesenchymal-like (MES-like)⁵. Integrated analysis of GBM and normal developmental brains revealed conserved trilineage differentiation hierarchy of GBM cells that mirror normal neurodevelopment¹⁰.

While scRNA-seq can profile transcriptomes of thousands of cells in a single experiment, it only provides indirect inference of cell-to-cell interactions due to the loss of spatial information. In brain malignancies, cellular interactions and tumor architecture are key factors driving the clonal evolution, tumor progression and therapeutic resistance^12,13,14,15. Recent advancements of spatial transcriptomics technologies have enabled in situ transcriptome profiling without the need for tissue dissociation. This provides a unique opportunity to decipher how malignant cells are spatially organized and interact with their immediate microenvironment. Recent studies based on spatial transcriptomics have revealed spatial localization of GBM cells with distinct transcriptional phenotypes^16,17,18.

Although scRNA-seq and spatial transcriptomics enable us to decipher tumor compositions, these technologies are expensive, require specialized expertise, and are not included as a routine assay for cancer diagnosis, which restricts their clinical applications. Therefore, how cellular composition and spatial architecture contribute to patient prognosis has not been completely resolved. Compared to transcriptome profiling on the other hand, histology images are widely available and easier to obtain. In addition, the recent advancement in digital profiling of whole-slide images (WSIs) has enabled the generation of high-resolution cellular maps of tumors from large patient cohorts¹⁹. These technical advances have motivated studies that use deep learning to automate clinical diagnosis^20,21, detect metastasis²², quantify immune-infiltrating cells^23,24, classify cancer subtypes and predict tumor grade^25,26. Some others used it for predictions of molecular traits, such as gene expression²⁷, mutations²⁸, copy number alterations²⁹ and hormone receptor status³⁰. Since molecular profiles are known to shape cell morphological features, we hypothesize that the transcriptional subtypes of malignant cells can be inferred from histology images, and this will enable us to computationally reconstruct cellular maps with informed transcriptional subtypes and link spatial cellular architectures to clinical outcomes.

In this study, we utilize a reciprocal approach to investigate the effect of transcriptional subtype compositions and spatial cellular organization on GBM prognosis. Firstly, we develop a deep learning model capable of predicting the transcriptional subtypes of malignant cells based on histology images. The model is trained using spatial transcriptomics data and validated in external testing cohorts. Leveraging histology images from 410 patients, we phenotypically analyze 40 million tissue spots and identify consistent associations between tumor architecture and prognosis across two independent GBM cohorts. Additionally, we train a separate deep learning model that leverages histology images to predict prognosis. Applying this model to spatial transcriptomics data lead to the identification of survival-associated regional gene expression programs. Finally, we develop a user-friendly software, named GBM360, that allows users to characterize tissue compositions and spatial cellular organization of new GBM cases. Although the current study focuses on GBM, the multi-modal data integration framework presented here is scalable to other diseases.

Results

Identifications of spatially resolved transcriptional subtypes

To resolve the transcriptional heterogeneity of GBM within the spatial context, we performed an integrative analysis of three spatial transcriptomics datasets (Supplementary Data 1)^16,18. The integrated dataset comprised 23 GBM samples obtained from 22 patients. Each sample contains 2500 ~4702 gene expression spots, resulting in 75,625 transcriptomes. Data preprocessing and batch-effect normalization were described in the “Methods” section. To determine the number of cells in each spot, we performed nuclei segmentation on histology images. The cell count ranged from 3 to 38, with an average count of 13 cells (Supplementary Fig. 1a). To determine genomic abnormalities of the GBM samples, we inferred copy number alterations (CNAs) using the transcriptomics profile of each spot, where data from a separate cohort of normal brain tissues (n = 6 tissues from 3 patients) were used as a reference³¹. Tumor samples demonstrated broad CNAs across chromosomes, including gains of Chr 6, Chr 7 and loss of Chr 8, Chr 10 and Chr14 (Fig. 1a and Supplementary Fig. 1b). Since GBM cells are highly infiltrative, each spot may contain a mixture of tumor cells and normal brain tissues. To estimate the tumor cell content within each spot, we first designated a prominent CNA event that shared across all the spots in each tumor as tumor signature CNA. The tumor cell content was then estimated based on the score of the CNA signature (“Methods”). At least three signature events were calculated in each tumor to ensure robust and unbiased estimations. We found that our approach was able to distinguish tumor regions versus histologically normal peripheral tissues (Supplementary Figs. 1c, d). Therefore, we used CNA-based estimation of tumor cell contents to filter malignant spots, while spots with low (<20%) tumor cell content were removed in our subsequent analysis.

**Fig. 1: Identifications of spatial gene expression programs in GBM.**

To identify transcriptional subtypes of the malignant spots, we employed two complementary approaches. First, we performed consensus non-negative matrix factorization (cNMF)³² using transcriptomes from malignant spots to identify recurrent gene expression modules across the patients. Second, we performed computational deconvolution of the spots using data from single-cell RNA-seq as references. Through the cNMF analysis, we discovered five distinct meta-gene modules (Fig. 1b and Supplementary Data 2). To differentiate these modules from the published single-cell RNA-seq modules⁵, we named each of them as “nmf.x” (e.g., nmf.NPC). The first two modules were associated with neuronal lineage development and synaptic functions (Fig. 1b and Supplementary Figs. 2a, b). Module #1 was enriched with markers for neural progenitor cells (SNAP25, CD24 and SYN1)³³, while module #2 was strongly associated with oligodendrocyte progenitors (PLP1, CNP, MBP)^6,8. Therefore, we designated module #1 as nmf.NPC and module #2 as nmf.OPC. Module #3 exhibited co-expression of astrocytic markers (GFAP and APOC1)⁶ and genes involved in antigen processing and inflammatory response (e.g., HLA-DRA, B2M, and CD74) (Fig. 1b and Supplementary Fig. 2c). This co-expression pattern likely reflects the reactive transformation of astrocytes^34,35,36. Therefore, we named module #3 as nmf.RA (reactive astrocytes). The remaining two modules, #4 and #5, were enriched with mesenchymal (MES)-related genes, such as VIM and COL6A1 (Fig. 1b). Module #4 demonstrated enrichment in glycolytic process (GAPDH, PGK1, LDHA) and hypoxia response (VEGFA, HILPDA, ADM) (Supplementary Fig. 2d), hence named as nmf.MES-hypoxia. On the other hand, module #5 was enriched with genes encoding extracellular matrix (COL6A1, FN1, MMP9), but lacked hypoxia signatures, and was designated as nmf.MES-like (Supplementary Fig. 2e).

To assess how the cNMF modules were related to published transcriptional modules in GBM, we performed spatially weighted correlation analyses. We first compared each cNMF module to the modules defined in a spatial transcriptomics study by Ravi et al. ¹⁶ As expected, we observed strong correlations between the nmf.NPC and nmf.OPC modules with the Regional.NPC and Regional.OPC modules, respectively (Fig. 1c, P < 0.001). The nmf.RA correlated with both the Radial.Glia (P < 0.001) and Reactive.Immune (P < 0.01) modules. This overlap is expected due to the close relationship identified between the Radial.Glia and Reactive.Immune modules in the original study¹⁶. The nmf.MES-like and nmf.MES-hypoxia modules were significantly correlated with the Reactive.Immune and Reactive.Hypoxia module, respectively (Fig. 1c). To provide additional validation, we further compared our modules to those defined from a single-cell RNA-seq study by Neftel et al. ⁵ The nmf.NPC, nmf.OPC, and nmf.RA modules demonstrated strong correlations with the NPClike, OPClike and AClike modules, respectively (Fig. 1d). Similarly, the nmf.MES-like and nmf.MES-hypoixa modules were strongly correlated with the MESlike1 and MESlike2 modules, with the nmf.MES-hypoixa module specifically correlated with the MESlike2 module. Notably, the MESlike2 module identified by Neftel et al. was also enriched with hypoxia-response genes, demonstrating the strong relationship between the cNMF modules and those derived from single-cell RNA-seq. By analyzing the transcriptional subtypes of spots determined by the top-scoring cNMF module, we discovered that each tumor harbored multiple transcriptional subtypes (Fig. 1e). Spots of different subtypes were localized within distinct spatially segmented regions (Fig. 1f). These observations highlight the transcriptional diversity and spatial heterogeneity within GBM tumors.

Next, we performed deconvolution analysis to estimate the fraction of different GBM cell types with the malignant spots using data from single-cell RNA-seq (Fig. 1g and “Methods”). To capture both tumor cells and immune cells, we integrated three single-cell RNA-seq datasets as references: GSE131928, GSE163108, and GSE84465 (Supplementary Data 1)^5,11,37. For tumor cells, we included four transcriptional subtypes from the reference: (1) NPClike, (2) OPClike, (3) AClike, (4) MESlike. The MESlike cells were further classified into hypoxia-dependent (MEShypoxia) and hypoxia-independent (MESlike) groups based on the expression of hypoxia-response genes (e.g., HILPDA, VEGFA) and glycolytic genes (e.g., GAPDH, LDHA)⁵. For immune cells, we focused on T cells and macrophages. Details of the deconvolution analysis and validation of the results can be found in the “Methods” section (Supplementary Figs. 1e, f). Based on our deconvolution analysis, we found that the composition of tumor cells was relatively homogenous within individual spots. In most spots, the dominate tumor cell type accounted for over 70% of all tumor cells (Fig. 1h). Notably, approximately 30% of the spots consisted exclusively of one tumor cell type (Fig. 1h). Analysis of the immune cell distributions showed that 49% of the spots contained one immune cell type, and approximately 8% of the spots consisted of a mixture of both T cells and macrophages (Fig. 1i). Overall, the spots classified as nmf.NPC, nmf.OPC, nmf.RA, nmf.MES-like, nmf.MES-hypoxia were enriched with NPClike, OPClike, AClike, MESlike and MEShypoxia cells, respectively (Fig. 1j). In addition, the nmf.RA spots had increased macrophage infiltration compared to the other spots, while the nmf.MESlike spots had increased proportions of T cells. These results demonstrated that a single spot was typically dominated by one tumor cell type, while the tumor cells were frequently mixed with immune cells.

Transcriptional subtypes can be predicted from histology images

Since gene expression features are known to shape cell morphology, we hypothesized that the cell-type distribution can be inferred from histology images. We developed GBM-CNN, a convolution neural network for image classification (Fig. 2a). The input to GBM-CNN were patches extracted from hematoxylin-eosin (H&E)-stained histology images. The edge length (56 $\mu m$) of one patch was roughly equal to the diameter (55 $\mu m$) of one gene expression spot in spatial transcriptomics. The output was cell types present in each patch. We formulated the problem as a multi-label classification task. For predicting tumor cells, since each spot was predominately occupied by only one cell type (Fig. 1h), we aimed at predicting the dominant type of tumor cells in each patch. For immune cells, since both T cells and macrophages were frequently mixed with tumor cells (Fig. 1i, j), we included them as independent labels. To assess the performance of GBM-CNN, we carried out leave-one-out cross-validation (LOOCV) using data from the spatial transcriptomics cohort (n = 23 tumors; n = 69,647 spots). In each iteration, the model was trained on spots from 22 tumors, while spots from the remaining tumor (n = 1) were reserved for held-out validation. To prevent overfitting, the model architecture and associated hyperparameters remained consistent across all iterations (“Methods”). After each iteration, we evaluated the model’s performance on the validation sample. For predicting tumor cells, the F1 score was 0.86, and the standard deviation (SD) was 0.15 (Fig. 2b). The area under the receiver operating characteristic curve (AUROC) was 0.93 and the SD was 0.05 (Supplementary Fig. 3a). Spatial visualization showed that the model accurately predicted the distribution of dominant cell types in even the most heterogenous tumors within the cohort (Supplementary Fig. 3b). To assess whether the dominant tumor cell type correlated with any histological features, we extracted the color values of each image channel, along with the texture and histogram features from the H&E images (“Methods”). Our analysis showed that the extracted image features varied across different samples (Supplementary Fig. 3c). However, when we performed the same analysis on the feature vector (2048 $\times$ 1) extracted from the last layer of the ResNet50 module (Fig. 2a), we observed a strong correlation between the resulting clusters and the transcriptional subtypes in the validation samples (Supplementary Fig. 3d). These results indicated that GBM-CNN learned latent representations of the transcriptional subtypes beyond raw histological features.

**Fig. 2: Development and validation of GBM-CNN for spatially resolved transcriptional subtype prediction.**

We next assessed the performance of GBM-CNN in predicting immune cells. Since T cells and macrophages were treated as independent labels, we assessed the sensitivity and specificity for each cell type separately (Fig. 2c, d). For predicting T cells, the sensitivity was 0.80 (SD: 0.08), specificity was 0.83 (SD: 0.05), and AUROC was 0.80 (SD: 0.08) (Fig. 2c and Supplementary Fig. 3e). For predicting macrophages, the sensitivity was 0.84 (SD: 0.09), specificity was 0.81 (SD: 0.07), and AUROC was 0.89 (SD: 0.11) (Fig. 2d and Supplementary Fig. 3f). These results demonstrated that the GBM-CNN was able to accurately predict the subtypes of tumor cells and the presence of immune cells from histology images.

To test whether the classification performance of GBM-CNN can be generalized to external patient cohorts, we applied the image model to whole-slide images (WSIs) from the IvyGap cohort (n = 184 slides from n = 8 patients)³⁸ and the TCGA-GBM cohort (n = 693 slides from 312 patients)³⁹. The IvyGap cohort included data from in-situ RNA hybridization for 343 GBM-related genes and their adjacent H&E-stained histology sections. We expected that tumor cells assigned with a specific transcriptional subtype from GBM-CNN should have high messenger RNA (mRNA) expression levels (i.e., ground truth) for the corresponding cell-type signatures. As shown in Fig. 2e, f, tumor regions classified as “MESlike” showed high mRNA expression levels of the MES-related signatures, such as COL4A1, PDPN. In addition, regions classified as “MES-hypoxia” aligned with the expression of HIF1A. Similarly, regions classified as “NPC-like” and “OPC-like” were associated with the expression of SNAP25 and OLIG2, respectively. Regions classified as “AC-like” displayed elevated expression of PTPRZ1 and EGFR, and regions with macrophage infiltration aligned with the expression of CD163 (Fig. 2f).

To further validate GBM-CNN, we predicted the transcriptional subtypes of tumor cells using WSIs from the TCGA-GBM cohort³⁹. Each tumor was composed of three to five transcriptional subtypes, with 76.2% of the tumors consisting of all five subtypes (Fig. 2g, h and Supplementary Fig. 3g). Although spatial transcriptomics data were not available for this cohort, we estimated transcriptional subtype proportions of each tumor by computationally deconvoluting the matched bulk RNA-seq data (“Methods”). Since bulk RNA-seq and histology images are independent modalities, we could validate our image model by comparing the subtype composition estimated from RNA-seq deconvolution versus our image predictions. Remarkably, we observed a significant correlation between the transcriptional subtype proportions estimated from bulk RNA-seq deconvolution and those predicted by our image model (Fig. 2g, h, i). These results indicated that the transcriptional subtypes of malignant cells can be predicted from histology images with GBM-CNN.

Associations between the transcriptional subtype composition and prognosis

To assess how the predicted transcriptional subtype composition was associated with prognosis, we used diagnostic slides from patients of the TCGA cohort³⁹ as the discovery cohort, and slides from the CPTAC cohort⁴⁰ as the validation cohort (Supplementary Data 1). Since the absolute size for the resected tumor region varied across patients, the downstream analysis of tissue compositions may lead bias to tumors with large resections. To overcome this potential sampling bias, we implemented two strategies. First, we ranked the tumors in each cohort based on their number of patches (indicating tissue size) and removed the bottom 5% tumors with the smallest number of patches. Second, we included gender, age, IDH status, and tissue size as covariates in our Cox regression analysis. Following this rigorous filtering strategy, we obtained a final set of 693 slides (n = 312 patients) in the TCGA cohort and 227 slides (n = 98 patients) in the CPTAC cohort. Our multivariate Cox regression analysis showed that samples with a high proportion of patches classified as MES-hypoxia were associated with a worse prognosis (Table 1). In the TCGA cohort, the hazard ratio (HR) for the MES-hypoxia subtype was 2.06 (P = 0.008), and in the CPTAC cohort was 2.23 (P = 0.01). Moreover, the proportion of NPC-like subtype was associated with a better prognosis in both cohorts, although these associations did not reach statistical significance.

Table 1 Cox regression analysis showing the effect of subtype proportions on prognosis

Full size table

Associations between spatial cellular architecture and prognosis

The results presented so far have linked transcriptional subtype compositions to patient prognosis. However, since spatial organization and cellular interactions play critical roles in driving clonal evolution, tumor progression and therapeutic resistance^12,13,14,15, we sought to assess how the spatial distribution of malignant cells contributes to prognosis. To characterize the spatial cellular organization, we first constructed a spatial neighborhood graph to represent cell communities within each tumor (Fig. 3a). In this graph, each patch was a node and edges represented direct connections between patches. The phenotype of each node was the predicted transcriptional subtype represented by the dominant tumor cell type in each patch. To ensure unbiased exploration of the spatial cellular organization and its association with prognosis, we included subtype proportions as covariates in our survival analysis.

We first assessed how the frequency of interactions between different transcriptional subtypes contributed to prognosis. Our results revealed that an increased connectivity between AC-like subtypes corresponded to an increased risk (Fig. 3b, c and Fig. 3d, e, TCGA: HR = 3.70, P < 0.01; CPTAC: HR = 3.32, P < 0.01). Conversely, when the AC-like subtype was connected with other subtypes, such as NPC-like, MES-like, and MES-hypoxia, the risk was decreased (Fig. 3b, c and Fig. 3f, g). To further confirm these results, we examined the clustering coefficient, which indicates the degree to which the same transcriptional subtype clusters together in the spatial neighborhood graph (“Methods”). We found that a higher clustering coefficient for the AC-like subtype was associated with a poorer prognosis (Table 2, TCGA: HR = 3.82, P = 0.004; CPTAC: HR = 3.96, P = 0.009). Notably, the proportion of the AC-like subtype alone was not a significant predictor of prognosis (Table 1), highlighting the value of spatial relationships over abundance alone.

Table 2 Cox regression analysis showing the effect of clustering coefficient on prognosis

Full size table

Furthermore, our analysis revealed that a higher interaction between the OPC-like and MES-hypoxia subtypes was strongly associated with a poorer prognosis in both cohorts (Fig. 3b, c and Fig. 3h, i, TCGA: HR = 4.30, P < 0.001; CPTAC: HR = 4.82, P < 0.001). Overall, these findings underscored the significance of spatial interactions between transcriptional subtypes in affecting patient prognosis.

In situ identifications of gene expression markers associated with prognosis

The results presented so far have established a connection between transcriptional subtype compositions and spatial architecture with patient prognosis. To further explore spatial gene expression programs associated with prognosis, we developed a separate deep learning model that used histology images to predict prognosis (Fig.4a). The model aimed at assigning an aggressive score to each patch, where higher scores contributing to a worse prognosis. We evaluated the model’s performance through a five-fold cross-validation using data from the TCGA-GBM cohort (n = 693 slides from n = 312 patients). Additionally, we tested the model trained on the TCGA cohort on the CPTAC cohort (n = 227 slides from n = 98 patients). To assess the accuracy of the model, we derived a composite score (CS) that combined the concordance index (C-index)^41,42 and integrated brier scores^43,44 (“Methods”). The CS ranges from 0.0 to 1.0, with higher values indicating more accurate predictions. In the TCGA cohort, the model achieved a CS of 0.74 (SD: 0.03). Furthermore, we divided the patients into a high-risk and a low-risk group using the median predicted score. Patients in the high-risk group showed significantly worse prognosis compared to patients in the low-risk group (Supplementary Fig. 4a, log-rank test, P = 2.47E-07). To benchmark the model’s performance, we compared it to a baseline model, where aggressive scores were predicted by a random, untrained model with the same architecture. The CS for the trained model was significantly higher than that of the baseline model (Supplementary Fig. 4b, Mann-Whitney U test, P = 0.004). Similarly, in the CPTAC testing cohort, the model achieved a CS of 0.75, and patients assigned with high aggressive scores had significantly worse prognosis compared to those with low aggressive scores (Supplementary Fig. 4c, log-rank test, P = 0.03).

To identify survival-associated spatial gene expression programs, we next used the trained prognostic model to predict an aggressive score for each spot (n = 69,647) in the spatial transcriptomics cohort using the paired histology images (Fig. 4b). Our analysis revealed that the aggressive scores were significantly different between transcriptional subtypes (Fig. 4c–e). The MES-hypoxia subtype was assigned with the highest aggressive scores, followed by the reactive astrocytes, MES-like, NPC-like, and OPC-like cells (Fig. 4e). To identify genes associated with prognosis, we divided all spots using the median score of this cohort, and spots assigned with high scores were compared to those with low scores. We discovered 4,569 genes that were significantly upregulated in regions with high aggressiveness (Fig. 4f and Supplementary Data 3, log2FC > 0.25, P < 0.01) and 1,984 genes significantly upregulated in regions with low aggressiveness (Supplementary Data 4, log2FC > 0.25, P < 0.01). Gene set analysis showed genes related to high aggressiveness (Fig. 4g) were involved in the regulation of injury response (SOD2, TNR, PTN), glycoprotein metabolic process (MT3, RAMP1, CST3), antigen processing (AZGP1, CD74, HLA-DRA), response to oxidative stress (RHOB, PON2, AQO1), and gliogenesis (PTPRZ1, GFAP, SOX4). On the other hand, genes related to low aggressiveness (Fig. 4h) were associated with neuronal development, including neural nucleus development (MBP, CALM1, CNP), oligodendrocyte differentiation (PLP1, OPALIN, MAG), neurotransmitter transport (SNAP25, SYT1, SLC17A7) and axon development (STMN1, UCHL1, CCK). Some genes were known to be associated with GBM prognosis, such as PTPRZ1⁴⁵ and EGFR⁴⁶. However, we identified many other genes that were previously unknown, such as SNRPD3, TPST1 and GUCD1. These data demonstrated that the reactive transformations of malignant cells in response to hypoxic environment and inflammatory stimuli contributed to a worse prognosis.

Software for predictions of spatial transcriptional subtypes and aggressiveness

To make our trained image models accessible for future research, we developed GBM360 (https://gbm360.stanford.edu), a user-friendly software for the prediction and visualization of transcriptional subtypes and prognosis in GBM histology images (Fig. 5a). With GBM360, users can upload H&E histology images in the svs or tiff format (Fig. 5b, c). The software offers three key functionalities: (1) predicting transcriptional subtypes of GBM cells from histology images and visualizing the resulting spatial cellular maps (Figs. 5d), (2) predicting and visualizing regional aggressive scores (Figs. 5e), and (3) performing various statistical analysis for characterizing transcriptional subtype compositions, spatial cellular organization and subtype interactions (Fig. 5f, g, h).

**Fig. 5: Screenshots of the GBM360 software.**

Discussion

The emerging spatial transcriptomic technologies have enabled transcriptomic profiling while preserving the tissue architecture. In addition, high-resolution histology images are readily available in spatial transcriptomics, providing a key opportunity for the integration of molecular characteristics and histological features. Here, we integrated data from single-cell RNA-seq, spatial transcriptomics and histology images to resolve the spatial cellular heterogeneity in GBM. The results presented have the potential to improve our understanding of how spatial cellular architecture was associated with patient prognosis.

In the past decade, deep learning-based computational approaches have demonstrated great potentials in revolutionizing tumor diagnosis. A number of deep learning algorithms, such as convolutional neural networks (CNNs), have been trained to extract intricate patterns and features from H&E histology sections. These algorithms have shown effectiveness in various aspects of tumor diagnosis, including tumor grading, subtyping, and prediction of patient outcomes^{20,21,22,23,24,25,26}. To obtain deeper biological insights from histology images, subsequent studies have applied deep learning to predict molecular traits, such as gene expression²⁷, mutations²⁸, copy number alterations²⁹ and hormone receptor status³⁰. He et al. developed ST-Net, a CNN-based algorithm, to predict the expression of spatially variable genes in breast cancer using histology images⁴⁷. Subsequent work by Zeng et al. applied Vision Transformer-based deep learning models to predict spatial gene expression in breast cancer⁴⁸. Despite the importance of predicting transcription levels of individual genes, it is also critical to consider the interactions of genes within modules, where a group of functionally related genes collectively define the transcriptional states of malignant cells. In the current study, we integrated data from single-cell RNA-seq and spatial transcriptomics to identify spatially resolved gene expression programs in GBM. Our analysis based on the cNMF lead to the discovery of five meta-gene modules, including NPC-like, OPC-like, reactive astrocytes, MES-like, and MES-hypoxia. Comparative analysis of the top-scoring signatures with published gene expression modules showed that the detected cNMF modules were congruent with the existing classification of GBM cells^5,8,10,37,49.

Despite the rapid development of the spatial technologies, their application as routine diagnostic assay is limited by their high costs and the requirement of specialized expertise. In contrast, histology images are widely available and cheaper to obtain. In this study, we tested whether the transcriptional subtypes of GBM cells could be inferred from histology images. To tackle this challenge, we developed GBM-CNN, a deep-learning model that uses histology images to predict the transcriptional subtypes of GBM cells. The model was trained and evaluated using spatial transcriptomics data and subsequently validated in external testing cohorts. Using GBM-CNN, we phenotyped over 40 million tissue spots from 920 whole-slide images across two independent cohorts, enabling the computational reconstruction of high-resolution cellular maps in 410 GBM patients. Our analysis revealed that each tumor was composed of three to five malignant transcriptional subtypes, with over 75% of the tumors consisted of all five transcriptional subtypes, highlighting the intra-tumoral cell-state heterogeneity. Integrating the predicted cellular maps with clinical data led to the discovery of survival-associated spatial cellular compositions. Notably, a higher proportion of the MES-hypoxia subtype correlated with a worse prognosis. This result is supported by recent studies showing that the hypoxic environment drives metabolic alterations of tumor cells, leading to the accumulation of genomic instabilities and epigenetic disorders, ultimately driving increased aggressiveness and therapeutic resistance^50,51,52. By enabling automated detection of hypoxic regions in histology images, our model holds the potential to enhance diagnosis and facilitate personalized treatment strategies.

In addition to the composition of transcriptional subtypes, the spatial cellular organization plays a critical role in driving clonal evolution, tumor progression, and therapeutic resistance^12,13,14,15. Through the analysis of transcriptional subtype interactions and clustering coefficient, we found that a clustering pattern of the AC-like tumor cells was associated with poor patient prognosis. Conversely, when the AC-like tumor cells were dispersed and connected to the other subtypes, the prognosis was improved. In line with this result, a recent biological study showed that the GFAP+ astrocytoma cells frequently form ultra-long membrane protrusions, known as tumor microtubes (TMs)¹². TMs connect astrocytoma cells with each other, leading to the formation of multicellular anatomical networks¹². In vivo studies showed that the microtube-connected cellular networks are resistant to the cytotoxic effects of radiotherapy¹². In response to radiotherapy, microtube-connected cells are protected from cell death, while unconnected cells die in relevant numbers. In our results, the highly clustered reactive astrocytes may represent the radioprotective, microtube-connected astrocyte cellular networks. The presented results have the potential to improve our understanding of how the spatial cellular organization contributes to tumor evolution and disease progression.

To further identify survival-associated transcriptional programs, we developed a separate deep learning model that uses histology images to predict prognosis. The model was trained on the TCGA cohort and further tested in the CPTAC cohort. Applying the trained model to paired histology images of the spatial transcriptomics led to the identification of regional gene expression programs associated with prognosis. We identified both known gene expression markers for GBM prognosis, such as PTPRZ1⁴⁵ and EGFR⁴⁶, as well as markers that were previously uninvestigated, such as SNRPD3, TPST1 and GUCD1. Overall, genes upregulated in high-aggressive regions were related to glycoprotein metabolism, antigen processing and response to axon injury. These results were congruent with our observations that the reactive transformation of malignant cells in response to metabolic and inflammatory stimuli was associated with a worse prognosis.

Limitations include the resolution for classifications of transcriptional subtypes at both the cellular and spatial dimensions. Due to the limitations of platform sensitivity, our model predicted transcriptional subtypes at a patch level rather than at a single-cell level. While our deconvolution analysis revealed that each spot was predominantly occupied by one tumor cell type, it is important to note that the tumor cells were often intermixed with immune cells, such as T cells and macrophages. Previous studies utilizing image cytometry and single-cell RNA-seq have identified the existence of distinct subsets of T cells and macrophages within the tumor microenvironment^11,53. Notably, different immune cell subsets exhibited distinct functions in regulating tumor progression⁵⁴. However, due to the trade-off between resolution and accuracy, we did not differentiate between different subsets of these immune cells in our deconvolution analysis. Consequently, our prognosis analysis was limited to tumor cells without considering the contribution of these immune cells. Future investigations into the interactions between tumor cells and immune cells will substantiate our understanding of how the tumor microenvironment influences disease progression and therapeutic responses.

In summary, we proposed a machine-learning framework that integrates histology images, spatial transcriptomics and patient clinical outcomes. The proposed framework offers an efficient and cost-effective approach for characterizing intra-tumoral cellular heterogeneity in GBM. Our results linked tumor compositions and the spatial cellular organization to patient prognosis. Although we demonstrated the value of our framework in GBM, it can be extended to other diseases.

Methods

Preprocessing of the spatial transcriptomics data

We used four publicly available spatial transcriptomics datasets comprising both tumors and normal brain tissues (Supplementary Data 1)^16,18. All datasets were generated using the 10X Visium platform. Quality control was performed by the cell ranger pipeline and imported into AnnData objects using the Scanpy software (version 1.9). In each sample, we removed spots with less than 200 detected genes and more than 5% mitochondrial RNA. Additionally, genes detected in less than 3 spots were removed. Given the potential presence of batch effects in spatial transcriptomics data, we performed normalization and variance stabilization across different samples using regularized negative binomial regression⁵⁵. We regressed out percentages of mitochondria-expressed genes per spot and effects from cell cycles. This approach allowed us to remove the influence of technical variances from downstream analyses while preserving biological heterogeneity.

CNA inference and prediction of tumor cell content

We used the InferCNV method⁵⁶ to estimate copy number alterations (CNAs) of each spot from GBM tissues, where the transcriptomes of a separate cohort of normal brains (n = 6 tissues from n = 3 patients) were used as a ref. ⁵⁷. We calculated an average gene expression value over a chromosomal window (default = 100 genes) across each analyzed gene/chromosomal region in GBM tissues and compared the value to its counterpart in normal brains. The output from InferCNV was a two-dimensional matrix indicating the CNA score of each chromosomal window in each spot. We then rescaled the CNA matrix, such that at each chromosomal window, the CNA score of normal brains ranged between 0.98 and 1.02, with an average score of 1.00. Compared to normal brains, tumor tissues exhibited a broad CNA across genome, such as gain of Chr 7 and loss of Chr 10. Then we selected a signature CNA event in each tumor that shared across all its spots. To define tumor signature CNAs, we required the average CNA score > 1.05 if the signature was a chromosomal gain or <0.95 if the signature was a chromosomal loss. The tumor cell content $C$ for a given spot $i$ was defined as ${C}_{i}=\frac{\left[{A}_{{CNVi}}-1\right]}{\left[{{\max }}\left({A}_{{CNV}}\right)-1\right]}$ if the signature was chromosomal gain and ${C}_{i}=\frac{\left[1-\,{A}_{{CNVi}}\right]}{\left[1-{{\min }}\left({A}_{{CNV}}\right)\right]}$ if the signature was chromosomal loss. At least three tumor signature events were used in each tumor, and the average results were calculated to ensure robust and unbiased estimation. Spots with $C$ > 0.2 were defined as malignant spots and retained for downstream analysis.

Consensus non-negative matrix factorization (cNMF)

To identify gene expression programs (i.e., meta-gene modules) that govern the transcriptional phenotypes of malignant cells, we used the cNMF algorithm (version 1.3.4)³². We aimed to generate an unbiased classification of transcriptional subtypes across the patients, where we didn’t assume that a transcriptional subtype can always be found in every patient and one tumor may not include all transcriptional subtypes. Therefore, we ran the cNMF using transcriptomes pulled from all patients. Given the high levels of inter-patient heterogeneity, it is possible that some transcriptional subtypes were present in only a subset of the patients but missing in other patients. Non-negative matrix factorization was run 200 times for k clusters, where k ranged from 2 to 15. The optimal k value was selected by finding the most stable clustering solution, in which the maximum clustering stability and lowest error rate were found.

Gene set analysis

Gene set analysis was performed with the clusterProfiler R package (version 4.2.1)⁵⁸. We selected the top 100 scoring genes of each meta-gene module, and hypergeometric testing was used to identify enriched biological processes using the gene ontology (GO). To determine significant biological processes, we set the P value to 0.05 and Q value to 0.20.

Spatially weighted correlation analysis

To correlate the cNMF modules defined from our study to published gene expression modules, we performed spatially weighted correlation analysis. We first scored each module in each spot using the AddModuleScore function from the Seurat R package (version 4.3.0)⁵⁹. We then correlated any two set of modules using geographically-weighted regression using the GWmodel package (version 2.2)⁶⁰.

Preprocessing and integration of single-cell RNA-seq data

To include both tumor cells and immune cells as references for deconvolution analysis, we integrated three single-cell RNA-seq datasets: GSE131928, GSE163108, and GSE84465 (Supplementary Data 1)^5,11,37. GSE131928 predominately contains tumor cells, while GSE163108 predominately contains immune cells. GSE84465 contains both tumor and immune cell types. We performed preprocessing and batch effect normalization following the same procedures as outlined in the previous section. Specifically, we used the SCTransform algorithm to regress out percentages of mitochondria-expressed genes per cell and cell-cycle effects⁵⁵. To stratify tumor cells into different transcriptional subtypes, we used the gene expression modules derived from GSE131928, which comprised GBM cells from 28 patients. Module scores were calculated for each tumor cell using the AddModuleScore function from the Seurat R package (version 4.3.0)⁵⁹, and the cell type was assigned based on the module with the highest score. For the subsequent deconvolution analysis, we randomly selected 20% of cells from each transcriptional subtype as the reference. In the case of immune cells, we randomly sampled 200 CD4 T cells and 200 CD8 T cells from GSE163108 and combined them with the immune cells from the other two datasets. Considering a balanced trade-off between resolution and accuracy of deconvolution, we merged CD4 and CD8 T cells into a single cell-type label. To generate a UMAP visualization of the integrated single-cell dataset, we normalized the total counts across all genes to ensure that every cell had the same total counts after normalization. The number of neighbors was set to 15, and the neighborhood graph was embedded into two dimensions using UMAP, and visualization was generated using the sc.pl.umap() function of the Scanpy software (version 1.9).

Align single-cell RNA-seq data to spatial transcriptomics

To deconvolute the spots obtained from spatial transcriptomics, we first determined the number of cells present in each spot. For this purpose, we performed nuclei segmentation on H&E-stained histology images using the StarDist algorithm (version 0.8.3, https://github.com/stardist/stardist)⁶¹. The accuracy of segmentation was confirmed through visual inspection. To estimate the fractions of different cell types within each spot, we constructed a reference dataset using the single-cell RNA-seq data, as described in the section above. To map the single cells to spots, we used the Tangram algorithm (version 1.0.4)⁶². In this process, we selected the top 200 differentially expressed genes between different single-cell clusters as training genes. To validate the deconvolution results, we utilized another set of 200 genes as testing genes, and we calculated the alignment score for each gene.

Image processing and data augmentation

Spatial transcriptomics cohort

For the training and internal validation of GBM-CNN, we used data from the spatial transcriptomics cohorts. All images were taken at 20x magnification. We enhanced the brightness and contrast of each image by 1.5 times, and the quality of the images were confirmed by visual inspection. Given the variation in H&E staining colors across different samples, we performed stain normalization using StainTools (version 2.1.2, https://github.com/Peter554/StainTools). For this purpose, we randomly selected 20 histology images from the TCGA-GBM cohort as references, and we normalized the stain colors of each image of the spatial transcriptomics to match those of each reference. The average image features derived from all references were used for further analysis.

To extract patches, we centered each patch at the spatial coordinate of its corresponding gene expression spot. Each patch had an edge length of 56 μm, which was approximately equal to the diameter (55 μm) of a single gene expression spot. The Pillow image Library (Version 9.2.0) was utilized for patch extraction, and the extracted patches were resized to 224 × 224 pixels. During the training phase of GBM-CNN, we performed image augmentation, including random horizontal and vertical flipping in 50% of the time, as well as random adjustments of brightness (factor = 0.25), contrast (factor = 0.25), and saturation (factor = 0.25).

TCGA, CPTAC and IvyGAP cohorts

Histology images of the TCGA-GBM cohort were obtained from the Genomic Data Commons (GDC) portal using a Data Transfer Tool Client (https://gdc.cancer.gov/access-data/gdc-data-transfer-tool). Histology images of the CPTAC-GBM cohort were download from the Cancer Image Archive (https://www.cancerimagingarchive.net/collections). The accession URLs were listed in Supplementary Data 1. Histology images of the IvyGap cohort were downloaded from the Ivy Glioblastoma Atlas Project (https://glioblastoma.alleninstitute.org) using the “Requests” HTTP library (version 2.31) in Python.

We used whole-slide images (WSIs) of formalin-fixed, paraffin-embedded (FFPE) diagnostic slides captured at 20x magnification, corresponding to a pixel resolution of 0.5 $\mu m$/pixel. To separate tissue sections (foreground) from the white background, we applied an Otsu segmentation mask to each WSI. Given the observed variation in H&E stain colors between the TCGA and CPTAC cohorts, we performed stain normalization using StainTools (version 2.1.2, https://github.com/Peter554/StainTools). We randomly selected 20 histology images from the TCGA cohort as references and normalized the color of each CPTAC image to match the stain colors of each reference. The average image features derived from all reference images were then used for subsequent analysis.

To extract patches, we used the OpenSlide library (Python API, version 1.2.0)⁶³. Each patch had an edge length of 56$\mu m$ (112 $\times$ 112 pixels), which was consistent with the patch size extracted from the spatial transcriptomics cohort. Each patch was then subsequently converted to 224 $\times$ 224 pixels as input to the model. If a patient had multiple slides, we included all of them for the presented analysis.

Architecture of GBM-CNN and training algorithm

All deep learning models were implemented with the PyTorch library (version 2.0). To extract histology features from each 224 $\times$ 224 pixel patch, we used a ResNet-50 module⁶⁴. Each patch was mapped to a feature vector of size 2048. To enhance model performance, we adopted a Transfer Learning approach in which the ResNet-50 module was initialized with weights pre-trained on ImageNet⁶⁵. During the training phase, we selected the last two layers of the ResNet blocks to fine-tune while freezing the other ResNet blocks. The feature vector was then converted to a probability vector through a fully connected layer. To optimize model weights, we used ADAM⁶⁶ as the optimizer and the cross-entropy as the loss function. The training parameters were selected empirically, with the mini-batch size set to 64, the learning rate set to 5e-4, and the weight decay set to 1e-5. The model was evaluated using leave-one-out cross-validation (LOOCV). In each iteration, the model was trained for five epochs on the training samples and validated using the sample left out for validation. Following the LOOCV, we trained a final model using data from all samples (n = 23) in the cohort. This final model was used to predict transcriptional subtypes in images obtained from external cohorts (i.e., TCGA, CPTAC, IvyGAP).

Infer cell-type proportions from bulk RNA-seq data

To estimate the fraction of each transcriptional subtype from the bulk RNA-seq, we used the CIBERSORTx algorithm⁶⁷. To construct a signature gene expression matrix, we randomly selected 100 cells from each transcriptional subtype from our integrated single-cell RNA-seq dataset (refer to the “Preprocessing and integration of single-cell RNA-seq data” section for details). For the bulk RNA-seq data, we obtained the raw gene expression counts of the TCGA-GBM cohort from the UCSC Xena browser⁶⁸. We performed the deconvolution analysis using the default parameters of CIBERSORTx, which provided us with the estimated proportions of each transcriptional subtype in each patient.

To establish a correlation between the fraction of transcriptional subtypes obtained from the deconvolution analysis of bulk RNA-seq and those predicted from histology images, we focused exclusively on frozen tissues from the TCGA cohort (n = 338 slides from n = 166 patients). These frozen tissue slides were derived from the same resected regions as the tissues used for bulk RNA-seq analysis. For all other analysis, the FFPE tissues were used as stated in the previous section.

Extraction of histological features

To assess whether the transcriptional subtypes correlated with any raw histological features, we extracted (1) the color values from each image channel, including their mean and quantiles, (2) histogram features and (3) texture features. The histogram features quantified histogram counts of color channel values, while texture features characterized the different combinations of distance and angle between pixels. The combined feature dimension was 105 $\times$ 1. The feature extraction was performed with the Squidpy Python library (version 1.2.2)⁶⁹.

Image-based aggressive score predictions

Model architecture and training algorithm

Similar to GBM-CNN, we used a ResNet-50 module⁶⁴ to extract histopathological features from patches, and each patch was converted into a one-dimensional feature vector $Z$, where the size of $Z$ was 2,048. The feature vector $Z$ was then mapped to an aggressive score through a fully connected layer implementing the Cox loss.

The goal of survival prediction is to predict the likelihood that the patient will survive until time $t$ given patient features $Z$. We used the Cox proportional hazards model to predict patient survival based on the feature vector, where the hazard function was $\lambda \left(t|Z \right)$ = ${\lambda }_{0}(t){{\exp }}(Z\bullet \beta )$. The ${\lambda }_{0}\left(t\right)$ was the baseline hazard function, and $\beta$ was the corresponding coefficient weight implemented in the fully connected layer. The Cox model was able to include censored data in case where the death time of some patients were unknown (either they were still alive, or we lost the track of their information at a certain time point). Let ${Z}_{i}$ be the features of the patient $i$, ${Y}_{i}$ be the survival time, and ${C}_{i}$ be the censor indicator, we have ${C}_{i}\,={\delta }_{{event\; i}={death}}$. The negative log-likelihood to minimize (or Cox loss) was

$${{{{{\mathcal{L}}}}}}\left(\beta \right)=-\mathop{\sum}\limits_{i{{{{{\rm{|}}}}}}{C}_{i}=1}\bigg({Z}_{i}\beta -{{\log }}\bigg(\mathop{\sum}\limits_{{j{{{{{\rm{|}}}}}}Y}_{j}\ge {Y}_{i}}{e}^{{Z}_{j}\beta }\bigg)\bigg).$$

(1)

We adapted this loss to a deep learning framework. Since ${Z}_{i}$ was the extracted features of patient $i$, ${Z}_{i}\beta$ can be represented by ${f}_{\theta }\left({X}_{i}\right)$ in the neural network setting, where ${X}_{i}$ was the input predictor of patient $i$, $f$ denoted a nonlinear mapping the neural network learns to first extract patient features from the predictor and to finally predict patient risk, and $\theta$ denoted the model parameters including the weights and biases of each neural network layer. Our objective to minimize was

$${{{{{\mathscr{L}}}}}}\left(\theta \right)=-\mathop{\sum}\limits_{i{{{{{\rm{|}}}}}}{C}_{i}=1}\bigg(\,{f}_{\theta }\left({X}_{i}\right)-{{\log }}\bigg(\mathop{\sum}\limits_{{j{{{{{\rm{|}}}}}}Y}_{j}\ge {Y}_{i}}{e}^{{\,f}_{\theta }\left({X}_{j}\right)}\bigg)\bigg).$$

(2)

In practice, we can’t compute the sum $\mathop{\sum}\limits_{{j{{{{{\rm{|}}}}}}Y}_{j}\ge {Y}_{i}}{e}^{{\,f}_{\theta }\left({X}_{j}\right)}$ over all patients. We adopt a batch sampling strategy and compute this sum with patients of each batch.

For training the model, we initialized the ResNet50 module with the weights of a model pretrained on ImageNet⁶⁵, and we selected the last two layers of the ResNet blocks to fine-tune while freezing the other blocks. The training is patch-based, and the model aimed at predicting aggressive scores for patches. In testing, we averaged aggressive scores of all patches from a patient to get the final aggressive score. Since training a model using all patches from a WSI could be computationally expensive, we randomly selected 200 patches from each patient. This number was determined based on a balanced consideration between performance and computational time after testing a range of different number of patches. We used ADAM⁶⁶ as the optimizer with cross-entropy as the loss function to optimize the model weights. The mini-batch size was set to 128 and the learning rate was 5e-4.

Evaluation of algorithm

We initially considered two standard evaluation metrics for testing the performance of a prognosis prediction model: (1) the concordance index (C-index)^41,42 and (2) the integrated Brier score (IBS)⁴³. The C-index is a performance measure that evaluates how well the predicted aggressive score ranks patients according to their actual survival time. It was calculated by dividing the number of all pairs of subjects whose predicted risks are correctly ordered, by the number of admissible pairs of subjects. A pair is considered admissible if neither event in the pair is censored, or the earlier time in the pair is not censored. A value of 1.0 indicates perfect prediction where all the pairs are correctly ordered, and a value of 0.5 indicates random prediction. The Brier score was calculated by the squared differences between observed survival status and the predicted survival probability at a given time point. The IBS provided an overall evaluation of the model performance at all available times. In contrary to the C-index, an IBS closing to 0.0 indicates good prediction, while an IBS closing to 1.0 indicates poor prediction.

Since GBM is a highly aggressive cancer type, and patient survival time is relatively short and homogenous, IBS is a more relevant evaluation compared to C-index⁴⁴. Therefore, we derived a composite score (CS) that integrated both IBS and C-index:

$${{{{{\rm{CS}}}}}}=\frac{{{{{{\rm{C}}}}}}-{{{{{\rm{index}}}}}}+\left(1-{{{{{\rm{IBS}}}}}}\right)}{2}\,.$$

(3)

The C-index was calculated using the “lifelines” package (version 0.27.4) in Python, and the IBS was calculated using the “survcomp” package (version 3.16) in R.

Spatial statistical analysis

Spatial neighborhood graph

To characterize the spatial cellular organization, we first built a spatial neighborhood graph on each WSI, where nodes are patches and edges are direct interactions between the patches. We used spatial coordinates of each patch to identify neighbors among them. We defined the neighbors of a patch as those patches that were located within a two-patch distance (maximum of 24 patches from 5 × 5 patches). The class phenotype at each patch is the predicted transcriptional subtype. The neighborhood graph can be denoted as $G=\left(V,{E}\right)$, where $V$ represents vertices (nodes) and $E$ represents edges between the vertices. The neighborhood graph was implemented using the Python NetworkX libarary⁷⁰.

Clustering coefficient

The clustering coefficient measures how well nodes of a specific class tend to cluster together. It is defined as the ratio of the number of interactions ($I$) between the class members to the number of all interactions that includes that class member:

$${{Cluster}}_{\left(m\right)}=\frac{I\left(M,\,M\right)}{I\left(M,\,M\right)+I\left(M,\,K\right)}\in \left[0,\,1\right]$$

(4)

where $K$ represents any class that is not class $M$.

Interaction matrix

Interaction matrix represents the number of edges between any two malignant cell types ($M$ and $K$) divided by the total number of edges between all malignant cell types in the graph:

$${{Interaction}}_{\left(m,k\right)}=\frac{I\left(M,\,K\right)}{\mathop{\sum}\limits_{i,\,j\,\in \,V} \, I\left(i,\,j\right)}.$$

(5)

Generation of abstractive networks

To enhance the visualization of spatial cellular interactions, we generated abstractive networks based on the predicted cellular maps. We first selected a region of interest (e.g., region with clusters of the AC-like subtype) from the spatial neighborhood graph of the image. Then, for each node in the selected region, we inspected its neighboring nodes. If over 70% of the neighboring nodes belong to the same transcription subtype as the target node, we aggregated all nodes in this neighborhood into one node. This process was repeated for every node in the selected region until converging. The resulting abstractive networks were visualized using the neworkD3 library (version 0.4).

Survival analysis

Molecular characteristics and clinical endpoints of GBM patients were obtained from published studies of the TCGA^71,72 and CPTAC⁴⁰ cohorts. Multivariate Cox regression analysis and the log rank test were performed using the lifelines package (version 0.27.4) in Python (version 3.9)⁷³. In Cox regression analysis, we included gender, age, tumor size and IDH subtype as covariates. The P values were adjusted for multiple testing using the Benjamini-Hochberg method.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

All datasets analyzed in the current study, including spatial transcriptomics, single-cell RNA-seq and histology images, are publicly accessible, with the accession URLs listed in Supplementary Data 1. The single-cell RNA-seq data were obtained from the GEO database under the following accession numbers: GSE131928⁵, GSE163108¹¹, GSE84465³⁷. The publicly available spatial transcriptomics data were acquired using the following accession URLs: (1) Datadryad [https://doi.org/10.5061/dryad.h70rxwdmj]¹⁶; (2) Figshare [https://doi.org/10.6084/m9.figshare.20653908.v3]¹⁸ (3) 10X Genomics [https://www.10xgenomics.com/resources/datasets/human-glioblastoma-whole-transcriptome-analysis-1-standard-1-2-0]; (4) LIBD [http://research.libd.org/spatialLIBD]³¹. The in-situ RNA hybridization data were obtained from the Ivy Glioblastoma Atlas Project using the accession URL [https://glioblastoma.alleninstitute.org]³⁸. The publicly available histology images of the TCGA-GBM cohort were downloaded from the GDC data portal [https://portal.gdc.cancer.gov/projects/TCGA-GBM], and the bulk RNA-seq data were obtained from the UCSC Xena browser [https://gdc-hub.s3.us-east-1.amazonaws.com/download/TCGA-GBM.htseq_counts.tsv.gz]³⁹. The publicly available histology images of the CPTAC-GBM cohort were downloaded from the Cancer Image Archive with the accession URL [https://www.cancerimagingarchive.net/collections], and the publicly available clinical data were obtained from the GDC data portal [https://portal.gdc.cancer.gov/projects/CPTAC-3]⁴⁰. The remaining data are available within the Article, Supplementary Information or Source Data file. Source data are provided with this paper.

Code availability

The source codes of the GBM360 software used to perform the analyses presented in this manuscript are available on GitHub at https://github.com/gevaertlab/GBM360 and Zenodo at https://doi.org/10.5281/zenodo.8051305⁷⁴.

References

Ostrom, Q. T. et al. CBTRUS statistical report: Primary brain and other central nervous system tumors diagnosed in the United States in 2012-2016. Neuro. Oncol. 21, v1–v100 (2019).
Article PubMed PubMed Central Google Scholar
Stupp, R. et al. Radiotherapy plus concomitant and adjuvant temozolomide for glioblastoma. N. Engl. J. Med. 352, 987–996 (2005).
Article CAS PubMed Google Scholar
Larsson, I. et al. Modeling glioblastoma heterogeneity as a dynamic network of cell states. Mol. Syst. Biol. 17, e10105 (2021).
Article CAS PubMed PubMed Central Google Scholar
Yabo, Y. A., Niclou, S. P. & Golebiewska, A. Cancer cell heterogeneity and plasticity: A paradigm shift in glioblastoma. Neuro. Oncol. 24, 669–682 (2022).
Article CAS PubMed Google Scholar
Neftel, C. et al. An Integrative Model of Cellular States, Plasticity, and Genetics for Glioblastoma. Cell 178, 835–849.e21 (2019).
Article CAS PubMed PubMed Central Google Scholar
Richards, L. M. et al. Gradient of Developmental and Injury Response transcriptional states defines functional vulnerabilities underpinning glioblastoma heterogeneity. Nat. Cancer 2, 157–173 (2021).
Article CAS PubMed Google Scholar
Patel, A. P. et al. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science 344, 1396–1401 (2014).
Article CAS PubMed PubMed Central ADS Google Scholar
Tirosh, I. et al. Single-cell RNA-seq supports a developmental hierarchy in human oligodendroglioma. Nature 539, 309–313 (2016).
Article PubMed PubMed Central ADS Google Scholar
Venteicher, A. S. et al. Decoupling genetics, lineages, and microenvironment in IDH-mutant gliomas by single-cell RNA-seq. Science 355, eaai8478 (2017).
Article PubMed PubMed Central Google Scholar
Couturier, C. P. et al. Single-cell RNA-seq reveals that glioblastoma recapitulates a normal neurodevelopmental hierarchy. Nat. Commun. 11, 3406 (2020).
Mathewson, N. D. et al. Inhibitory CD161 receptor identified in glioma-infiltrating T cells by single-cell analysis. Cell 184, 1281–1298.e26 (2021).
Article CAS PubMed PubMed Central Google Scholar
Osswald, M. et al. Brain tumour cells interconnect to a functional and resistant network. Nature 528, 93–98 (2015).
Article CAS PubMed ADS Google Scholar
Venkataramani, V. et al. Glutamatergic synaptic input to glioma cells drives brain tumour progression. Nature 573, 532–538 (2019).
Article CAS PubMed ADS Google Scholar
Venkatesh, H. S. et al. Electrical and synaptic integration of glioma into neural circuits. Nature 573, 539–545 (2019).
Article CAS PubMed PubMed Central ADS Google Scholar
Hara, T. et al. Interactions between cancer cells and immune cells drive transitions to mesenchymal-like states in glioblastoma. Cancer Cell 39, 779–792.e11 (2021).
Article CAS PubMed PubMed Central Google Scholar
Ravi, V. M. et al. Spatially resolved multi-omics deciphers bidirectional tumor-host interdependence in glioblastoma. Cancer Cell 40, 639–655.e13 (2022).
Article CAS PubMed Google Scholar
Coy, S. et al. Single cell spatial analysis reveals the topology of immunomodulatory purinergic signaling in glioblastoma. Nat. Commun. 13, 4814 (2022).
Article CAS PubMed PubMed Central ADS Google Scholar
Ren, Y. et al. Spatial transcriptomics reveals niche-specific enrichment and vulnerabilities of radial glial stem-like cells in malignant gliomas. Nat. Commun. 14, 1028 (2023).
Article CAS PubMed PubMed Central ADS Google Scholar
Aeffner, F. et al. Introduction to digital image analysis in whole-slide imaging: A white paper from the digital pathology association. J. Pathol. Inform. 10, 9 (2019).
Article PubMed PubMed Central Google Scholar
Baxi, V., Edwards, R., Montalto, M. & Saha, S. Digital pathology and artificial intelligence in translational medicine and clinical practice. Mod. Pathol. 35, 23–32 (2022).
Article CAS PubMed Google Scholar
Coudray, N. & Tsirigos, A. Deep learning links histology, molecular signatures and prognosis in cancer. Nat. cancer 1, 755–757 (2020).
Article PubMed Google Scholar
Kiehl, L. et al. Deep learning can predict lymph node status directly from histology in colorectal cancer. Eur. J. Cancer 157, 464–473 (2021).
Article PubMed Google Scholar
Lu, Z. et al. Deep-learning-based characterization of tumor-infiltrating lymphocytes in breast cancers from histopathology images and multiomics data. JCO Clin. Cancer Inf. 4, 480–490 (2020).
Google Scholar
Wang, X. et al. Spatial interplay patterns of cancer nuclei and tumor-infiltrating lymphocytes (TILs) predict clinical benefit for immune checkpoint inhibitors. Sci. Adv. 8, eabn3966 (2022).
Article CAS PubMed PubMed Central Google Scholar
Jaber, M. I. et al. A deep learning image-based intrinsic molecular subtype classifier of breast tumors reveals tumor heterogeneity that may affect survival. Breast Cancer Res 22, 12 (2020).
Article PubMed PubMed Central Google Scholar
Wang, Y. et al. Improved breast cancer histological grading using deep learning. Ann. Oncol. 33, 89–98 (2022).
Article CAS PubMed Google Scholar
Schmauch, B. et al. A deep learning model to predict RNA-Seq expression of tumours from whole slide images. Nat. Commun. 11, 3877 (2020).
Article CAS PubMed PubMed Central ADS Google Scholar
Bilal, M. et al. Development and validation of a weakly supervised deep learning framework to predict the status of molecular pathways and key mutations in colorectal cancer from routine histology images: a retrospective study. Lancet Digit Health 3, e763–e772 (2021).
Article CAS PubMed PubMed Central Google Scholar
Xu, Z. et al. Deep learning predicts chromosomal instability from histopathology images. iScience 24, 102394 (2021).
Article PubMed PubMed Central ADS Google Scholar
Naik, N. et al. Deep learning-enabled breast cancer hormonal receptor status determination from base-level H&E stains. Nat. Commun. 11, 5727 (2020).
Article CAS PubMed PubMed Central ADS Google Scholar
Maynard, K. R. et al. Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex. Nat. Neurosci. 24, 425–436 (2021).
Article CAS PubMed PubMed Central Google Scholar
Kotliar, D. et al. Identifying gene expression programs of cell-type identity and cellular activity with single-cell RNA-Seq. Elife 8, e43803 (2019).
Article PubMed PubMed Central Google Scholar
Nazir, F. H. et al. Expression and secretion of synaptic proteins during stem cell differentiation to cortical neurons. Neurochem. Int. 121, 38–49 (2018).
Article CAS PubMed PubMed Central Google Scholar
Henrik Heiland, D. et al. Tumor-associated reactive astrocytes aid the evolution of immunosuppressive environment in glioblastoma. Nat. Commun. 10, 2541 (2019).
Article PubMed PubMed Central ADS Google Scholar
Liddelow, S. A. & Barres, B. A. Reactive astrocytes: Production, function, and therapeutic potential. Immunity 46, 957–967 (2017).
Article CAS PubMed Google Scholar
Liddelow, S. A. et al. Neurotoxic reactive astrocytes are induced by activated microglia. Nature 541, 481–487 (2017).
Article CAS PubMed PubMed Central ADS Google Scholar
Darmanis, S. et al. Single-cell RNA-seq analysis of infiltrating neoplastic cells at the migrating front of human glioblastoma. Cell Rep. 21, 1399–1410 (2017).
Article CAS PubMed PubMed Central Google Scholar
Puchalski, R. B. et al. An anatomic transcriptional atlas of human glioblastoma. Science 360, 660–663 (2018).
Article CAS PubMed PubMed Central ADS Google Scholar
Brennan, C. W. et al. The somatic genomic landscape of glioblastoma. Cell 157, 753 (2014).
Article CAS Google Scholar
Wang, L.-B. et al. Proteogenomic and metabolomic characterization of human glioblastoma. Cancer Cell 39, 509–528.e20 (2021).
Article CAS PubMed PubMed Central Google Scholar
Harrell, F. E. Evaluating the yield of medical tests. JAMA 247, 2543–2546 (1982).
Article PubMed Google Scholar
Longato, E., Vettoretti, M. & Di Camillo, B. A practical perspective on the concordance index for the evaluation and selection of prognostic time-to-event models. J. Biomed. Inform. 108, 103496 (2020).
Article PubMed Google Scholar
Graf, E., Schmoor, C., Sauerbrei, W. & Schumacher, M. Assessment and comparison of prognostic classification schemes for survival data. Stat. Med. 18, 2529–2545 (1999).
Article CAS PubMed Google Scholar
Steyaert, S. et al. Multimodal deep learning to predict prognosis in adult and pediatric brain tumors. Commun. Med. (Lond.) 3, 44 (2023).
Article PubMed Google Scholar
Shi, Y. et al. Tumour-associated macrophages secrete pleiotrophin to promote PTPRZ1 signalling in glioblastoma stem cells for tumour growth. Nat. Commun. 8, 15080 (2017).
Article CAS PubMed PubMed Central ADS Google Scholar
Li, J., Liang, R., Song, C., Xiang, Y. & Liu, Y. Prognostic significance of epidermal growth factor receptor expression in glioma patients. Onco. Targets Ther. 11, 731–742 (2018).
Article PubMed PubMed Central Google Scholar
He, B. et al. Integrating spatial gene expression and breast tumour morphology via deep learning. Nat. Biomed. Eng. 4, 827–834 (2020).
Article CAS PubMed Google Scholar
Zeng, Y. et al. Spatial transcriptomics prediction from histology jointly through Transformer and graph neural networks. Brief. Bioinform 23, bbac297 (2022).
Article PubMed Google Scholar
Nowakowski, T. J. et al. Spatiotemporal gene expression trajectories reveal developmental hierarchies of the human cortex. Science 358, 1318–1323 (2017).
Article CAS PubMed PubMed Central ADS Google Scholar
Chou, C.-W. et al. Tumor cycling hypoxia induces chemoresistance in glioblastoma multiforme by upregulating the expression and function of ABCB1. Neuro. Oncol. 14, 1227–1238 (2012).
Article CAS PubMed PubMed Central Google Scholar
Johnson, K. C. et al. Single-cell multimodal glioma analyses identify epigenetic regulators of cellular plasticity and environmental stress response. Nat. Genet. 53, 1456–1468 (2021).
Article CAS PubMed PubMed Central Google Scholar
Qiu, G.-Z. et al. Reprogramming of the tumor in the hypoxic niche: The emerging concept and associated therapeutic strategies. Trends Pharmacol. Sci. 38, 669–686 (2017).
Article CAS PubMed Google Scholar
Karimi, E. et al. Single-cell spatial immune landscapes of primary and metastatic brain tumours. Nature 614, 555–563 (2023).
Article CAS PubMed PubMed Central ADS Google Scholar
Wang, H. et al. Different T-cell subsets in glioblastoma multiforme and targeted immunotherapy. Cancer Lett. 496, 134–143 (2021).
Article CAS PubMed Google Scholar
Hafemeister, C. & Satija, R. Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biol. 20, 296 (2019).
Article CAS PubMed PubMed Central Google Scholar
Patel, A. P. et al. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science 344, 1396–1401 (2014).
Maynard, K.R., Collado-Torres, L., Weber, L.M. et al. Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex. Nat. Neurosci. 24, 425–436 (2021).
Wu, T. et al. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. Innov. (N. Y) 2, 100141 (2021).
CAS Google Scholar
Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587.e29 (2021).
Article CAS PubMed PubMed Central Google Scholar
Gollini, I., Lu, B., Charlton, M., Brunsdon, C. & Harris, P. GWmodel: An R package for exploring spatial heterogeneity using geographically weighted models. arXiv [stat.AP] (2013).
Schmidt, U., Weigert, M., Broaddus, C. & Myers, G. Cell detection with star-convex polygons. arXiv [cs.CV] (2018).
Biancalani, T. et al. Deep learning and alignment of spatially resolved single-cell transcriptomes with Tangram. Nat. Methods 18, 1352–1362 (2021).
Article PubMed PubMed Central Google Scholar
Goode, A., Gilbert, B., Harkes, J., Jukic, D. & Satyanarayanan, M. OpenSlide: A vendor-neutral software foundation for digital pathology. J. Pathol. Inform. 4, 27 (2013).
Article PubMed PubMed Central Google Scholar
Shafiq, M. & Gu, Z. Deep residual learning for image recognition: A survey. Appl. Sci. (Basel) 12, 8972 (2022).
Article CAS Google Scholar
Deng, J. et al. ImageNet: A large-scale hierarchical image database. in 2009 IEEE Conference on Computer Vision and Pattern Recognition https://doi.org/10.1109/cvpr.2009.5206848 (IEEE, 2009).
Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. arXiv [cs.LG] (2014).
Newman, A. M. et al. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat. Biotechnol. 37, 773–782 (2019).
Article CAS PubMed PubMed Central Google Scholar
Goldman, M. J. et al. Visualizing and interpreting cancer genomics data via the Xena platform. Nat. Biotechnol. 38, 675–678 (2020).
Article CAS PubMed PubMed Central Google Scholar
Palla, G. et al. Squidpy: a scalable framework for spatial omics analysis. Nat. Methods 19, 171–178 (2022).
Article CAS PubMed PubMed Central Google Scholar
Hagberg, A., Swart, P. & Chult, D. S. Exploring network structure, dynamics, and function using networkx. (2008).
Liu, J. et al. An integrated TCGA pan-cancer clinical data resource to drive high-quality survival outcome analytics. Cell 173, 400–416.e11 (2018).
Article CAS PubMed PubMed Central Google Scholar
Ceccarelli, M. et al. Molecular profiling reveals biologically discrete subsets and pathways of progression in diffuse glioma. Cell 164, 550–563 (2016).
Article CAS PubMed PubMed Central Google Scholar
Davidson-Pilon, C. lifelines: survival analysis in Python. J. Open Source Softw. 4, 1317 (2019).
Article ADS Google Scholar
Zheng, Y., Carrillo-Perez, F., Pizurica, M., Heiland, D. H. & Gevaert, O. Spatial Cellular Architecture Predicts Prognosis in Glioblastoma. GBM360 https://doi.org/10.5281/zenodo.8051305 (2023).

Download references

Acknowledgements

We would like to extend our sincere gratitude to Dr. Yuan Wang (Department of Neurosurgery, West China Hospital, Sichuan University, China) for generously sharing the high-resolution histology images generated from their recently published spatial transcriptomics study of GBM¹⁸ in reponse to our request. These images have been made publicly acessible on Figshare as described in the “Data Availability” section. In addition, we thank Dr. Sandra Steyaert (Department of Medicine, Stanford University, CA, USA) for advice in graphical design and statistical analysis. Research reported here was further supported by the National Cancer Institute (NCI) under awards: R01 CA260271. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Author information

Authors and Affiliations

Department of Medicine, Stanford Center for Biomedical Informatics Research (BMIR), Stanford University, Stanford, CA, 94305, USA
Yuanning Zheng, Francisco Carrillo-Perez, Marija Pizurica & Olivier Gevaert
Department of Architecture and Computer Technology (ATC), University of Granada, Granada, 18014, Spain
Francisco Carrillo-Perez
Internet technology and Data science Lab (IDLab), Ghent University, Technologiepark-Zwijnaarde 126, Ghent, 9052, Gent, Belgium
Marija Pizurica
Microenvironment and Immunology Research Laboratory, Medical Center, University of Freiburg, Freiburg, 79106, Germany
Dieter Henrik Heiland
Department of Neurosurgery, Medical Center, University of Freiburg, Freiburg, 79106, Germany
Dieter Henrik Heiland
Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
Olivier Gevaert

Authors

Yuanning Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Carrillo-Perez
View author publications
You can also search for this author in PubMed Google Scholar
Marija Pizurica
View author publications
You can also search for this author in PubMed Google Scholar
Dieter Henrik Heiland
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Gevaert
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization, Y.Z. and O.G.; Methodology, Y.Z. and O.G.; Investigation, Y.Z., F.P., M.P.; Writing-Original Draft, Y.Z.; Writing-Review & Editing, Y.Z., F.P., M.P., D.H.H. and O.G.; Resources: D.H.H., O.G.; Funding Acquisition and Supervision, O.G.

Corresponding author

Correspondence to Olivier Gevaert.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Yuan Wang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Description of Additional Supplementary Files

Supplementary Data 1

Supplementary Data 2

Supplementary Data 3

Supplementary Data 4

Reporting Summary

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zheng, Y., Carrillo-Perez, F., Pizurica, M. et al. Spatial cellular architecture predicts prognosis in glioblastoma. Nat Commun 14, 4122 (2023). https://doi.org/10.1038/s41467-023-39933-0

Download citation

Received: 15 February 2023
Accepted: 30 June 2023
Published: 11 July 2023
DOI: https://doi.org/10.1038/s41467-023-39933-0

This article is cited by

Artificial intelligence in neuro-oncology: advances and challenges in brain tumor diagnosis, prognosis, and precision treatment
- Sirvan Khalighi
- Kartik Reddy
- Malak Abedalthagafi
npj Precision Oncology (2024)
Deep topographic proteomics of a human brain tumour
- Simon Davis
- Connor Scott
- Roman Fischer
Nature Communications (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.