Introduction

Glioblastoma (GBM) is one of the most feared human diseases due to high mortality rate and accompanying loss of cognitive function during the disease process. At present, however, there are few prognostic biomarkers and predictors of therapeutic response, as well as few therapeutic interventions strongly affecting disease outcome1,2,3. Although patients with IDH1 pathogenic variants have a significantly better prognosis than those with wild-type IDH1, these pathogenic variants are observed in only 4ā€“7% of primary GBM patients, restricting their use as a biomarker4. Similarly, DNA methylation status in MGMT promoter region is a predictive biomarker for response to temozolomide treatment5, but it is applicable only to the non-recurrent classical subtype GBM6.

Cancers are being increasingly classified based on their histopathological and molecular characteristics, leading to the trend of precision cancer medicine7,8,9. GBM has been classified into several molecular subtypes based on their gene expression profiles: classical, mesenchymal, (neural), and proneural1,10. Although these subtypes have distinct molecular signatures and etiologic factors, their relationship with overall survival (OS) is ambiguous except for patients with IDH1 pathogenic variant1,10,11. The inability to determine patient outcomes based on histopathological features and current molecular subtypes inhibits the ability to effectively manage GBM. Thus, clinically relevant GBM subtypes, with sufficient information on prognosis and biological phenotype, are required to optimize treatment. In this regard, this study assessed the ability of novel transcriptome-based prognostic subtypes and their molecular multi-omics signatures for new taxonomy of GBM patients: invasive, mitotic, and intermediate. We expect that our prognostic stratification of GBM has important clinical implications for diagnosis and treatment of patients, providing a framework that unifies transcriptomic, genomic, and clinical signatures.

Results

Overview of the approach

FigureĀ 1a shows a graphic flow chart of this study: (1) The correlation between expression level of each gene and patient OS was calculated using oncopression and the cancer genome atlas (TCGA)-GBM databases. (2) Forty genes correlating with poor prognosis and 40 correlating with favorable prognosis (PGs) were selected. (3ā€“4) GBM samples were subjected to single sample gene set enrichment analysis (ssGSEA) using these PGs, and prognostic subtypes were assigned. In addition to oncopression and TCGA-GBM, repository for molecular brain neoplasia data (REMBRANDT) and Severance datasets were used for validation. (5) Prognostic subtypes were functionally annotated using over-representation analysis (ORA); thereby, subtypes with poor and favorable prognosis were named ā€œinvasiveā€ and ā€œmitoticā€, respectively.

Figure 1
figure 1

Flow chart of the study and selection of PGs. (a) Graphical flow chart describing the calculation, assignment, and functional annotation of GBM prognostic subtypes. (b) GBM datasets retrieved from oncopression and TCGA databases were assorted by Verhaakā€™s molecular subtypes, and OS was compared by the Kaplan-Meier method (not significant by log-rank test). (c) Each dot indicates PCC of each gene calculated by correlation with OS. In the right panel, PCCs calculated by oncopression (x-axis) and TCGA-GBM (y-axis) are presented as a scatter plot with a linear regression line (Pā€‰<ā€‰0.001). The 40 highest (favorable) and lowest (poor) genes (PGs) are marked with colors. (d,e) Expression levels of these 80 PGs were compared in patients with low OS and high OS (d), and between normal and GBM samples (e).

Identification of PGs in GBM using transcriptome analysis

GBM samples were classified according to Verhaakā€™s subtypes, the most widely used gene expression-based classification of GBM1,10, and OS was compared in these subtypes by the Kaplan-Meier method. Although several distinct molecular features were reported, there was no relationship between these subtypes and OS (Fig.Ā 1b). To identify PGs, Pearsonā€™s correlation coefficients (PCCs) were calculated between the expression level of each gene and patient OS. Among them, genes having the highest PCCs in both oncopression and TCGA-GBM were classified as PGs (Fig.Ā 1c and TableĀ 1). Supplementary Fig.Ā S1 shows functional interactions among these PG sets. As expected, the 40 poor PGs showed higher expression levels in patients with shorter OS, whereas the 40 favorable PGs showed higher expression levels in patients with longer OS (Fig.Ā 1d). Notably, most poor PGs showed higher expression in GBM than in normal samples (Fig.Ā 1e).

Table 1 Gene sets associated with prognosis of GBM patients (PGs).

Assignment of prognostic subtypes for GBM

Using these PGs, GBM samples were subjected to ssGSEA to evaluate their prognosis scores, the criterion for subtype assignment; patients with prognosis scores <āˆ’1, >1, and between āˆ’1 and 1 were classified into the poor (invasive), favorable (mitotic), and intermediate subtypes, respectively (Supplementary Fig.Ā S2). Linear regression analyses of four independent datasets ā€“ oncopression, TCGA-GBM, REMBRANDT, and Severance datasets ā€“ revealed that prognosis score correlated significantly with OS of GBM patients (Fig.Ā 2a). Cox regression using prognosis score as univariate provides hazard ratios (HRs) with 95% confidence intervals (CIs): oncopression, HRā€‰=ā€‰0.837 (0.770ā€“0.910); TCGA-GBM, HRā€‰=ā€‰0.804 (0.736ā€“0.878); REMBRANDT, HRā€‰=ā€‰0.818 (0.738ā€“0.907); Severance, HRā€‰=ā€‰0.780 (0.650ā€“0.934). Each of the GBM datasets was divided into three groups according to their prognosis scores, and OS was compared in these groups using the Kaplan-Meier method. In all datasets, OS was significantly longer in the favorable than in the poor group, confirming that this transcriptome-based GBM classification into prognostic subtypes reflects patient OS (Fig.Ā 2b and Supplementary TableĀ S1). Notably, TCGA-GBM samples including only IDH1-wild-type patients ā€“ distinguished with IDH1-mutant GBM in WHO classification of tumors 2016 ā€“ also showed similar patterns, suggesting that longer OS of favorable group did not simply arise from IDH1 pathogenic variant (Fig.Ā 2a,b). In addition, similar patterns were observed in RNA-seq data (TCGA), and the prognosis scores of the matched patients in these two platforms showed significant correlation, suggesting that this method is applicable to both microarray and RNA-seq platforms (Fig.Ā 2c). We also evaluated prognosis scores in low-grade glioma samples (grade 2 and 3). Prognosis scores decreased significantly with increasing tumor grade, suggesting that this method is applicable to datasets that include low-grade glioma samples (Fig.Ā 2d). When we examined the relationship of this classification with Verhaakā€™s molecular subtypes, we found that the mesenchymal subtype was more enriched in the poor than in the favorable group, whereas the proneural subtype was more enriched in the favorable than in the poor group (Fig.Ā 2e).

Figure 2
figure 2

Assignment of prognostic subtypes. (a) Each dot indicates the prognosis score and OS of each GBM sample. Vertical dashed lines indicate threshold values (āˆ’1 and 1) for subtype assignment. The Pearson correlation was significant in all datasets; the linear regression line is shown in black. (b) Survival probability for each prognostic subtype was estimated based on Kaplan-Meier curves. Statistical significance was determined by the log-rank test (Pā€‰<ā€‰0.001 for oncopression, TCGA-GBM, and REMBRANDT, Pā€‰<ā€‰0.05 for Severance). (c) Corresponding presentation with (a) and (b) (left and center, respectively) was shown using RNA-seq data (TCGA). Scatter plot shows the correlation between prognosis scores of the matched patients obtained from microarray and RNA-seq data (TCGA; right). The Pearson correlation was significant (Pā€‰<ā€‰0.001, Rā€‰=ā€‰0.84); the linear regression line is shown in black. (d) Prognosis scores were compared among grades 2ā€“4 glioma samples from oncopression and REMBRANDT. Differences among groups were compared by one-way ANOVA with Tukeyā€™s post hoc test for multiple comparisons; ***Pā€‰<ā€‰0.001. (e) Distribution of GBM molecular subtypes is presented as heat maps. Upper line: P, poor; I, intermediate; F, favorable. Middle and bottom lines: C, classical; M, mesenchymal; N, neural; P, proneural.

Functional annotation of GBM prognostic subtypes

To determine the biological characteristics of each prognostic subtype, we first identified differentially expressed genes (DEGs) between poor and favorable groups in oncopression and TCGA-GBM datasets (Fig.Ā 3a and Supplementary DataĀ S1). These DEGs were then subjected to ORA for functional annotation. ORA using four gene set databases revealed that invasion- and immune-related gene sets were significantly enriched in the poor group, whereas cell cycle-related gene sets were significantly enriched in the favorable group (Fig.Ā 3b). Similar results were reproduced in enrichment maps using gene ontology (GO) hierarchy, in that many enriched GO terms in the poor group were in cell migration and invasion modules, whereas cell cycle-related GO terms were enriched in the favorable group (Supplementary Fig.Ā S3). Because signal transduction- and immune-related gene sets are frequently enriched in non-tumor samples, we focused on invasion-related gene sets in the poor group. The poor group was therefore named the ā€œinvasiveā€ subtype, and the favorable group was named the ā€œmitoticā€ subtype. Interestingly, prognosis scores significantly correlated with the invasive property of GBM samples, as illustrated in both MR images of GBM patients (Fig.Ā 3c) and collagen-based in vitro 3D invasion assays of patient-derived GBM tumorspheres (TSs; Fig.Ā 3d). Representative images of both subtypes are presented in Fig.Ā 3e. In addition, MGMT methylation was significantly enriched in the mitotic subtype, indicating that different therapeutic strategies are required in treating patients with these two prognostic subtypes (Fig.Ā 3f). Ki-67 expression did not differ significantly in the two subtypes, probably because proliferation is a universal hallmark of cancer. The clinical characteristics of patients, including age and sex, did not affect subtype classification (Supplementary Fig.Ā S4). HRs with 95% CIs obtained by Cox regression model were provided in Supplementary TableĀ S2. Collectively, these data suggest that prognostic subtypes have distinct biological phenotypes, differing especially in invasive properties.

Figure 3
figure 3

Functional annotation to prognostic subtypes. (a) Expression levels of DEGs were displayed as a heat map. DEGs were defined as genes with Pā€‰<ā€‰0.001 (FDR correction for multiple comparisons) between poor and favorable groups. (b) Functional annotation of DEGs was performed by ORA. Statistical significance was determined using Fisherā€™s exact test, and enrichment scores are presented as a heat map. P, poor; F, favorable. (cā€“e) Using Severance dataset, invasiveness was evaluated in MR images of GBM patients (c) and GBM TSs (d). Scatter plots show correlation between invasiveness and prognosis or poor score. Representative figures are presented in (e). (f) DNA methylation status in MGMT promoter region was compared between prognostic subtypes (Severance; ORā€‰=ā€‰odds ratio). Differences in subtypes were compared by two-tailed Studentā€™s t-test in (a,c,d) and by Fisherā€™s exact test in (f).

Genomic signatures of GBM prognostic subtypes

We next examined multi-omics signatures of each prognostic subtype. The distribution of genomic alterations in recurrently mutated genes (TCGA) showed that pathogenic variants in several genes occurred exclusively in the invasive or mitotic subtype. Pathogenic variants in CDH18, WNT2, COL1A2, and TGFA, all of which are associated with invasion12,13,14, were observed only in the invasive subtype. In contrast, pathogenic variants in IDH1 and ATRX, which are associated with good prognosis4, were observed only in the mitotic subtype, consistent with our prognostic subtype classification. Moreover, glioma-CpG island methylator phenotype (G-CIMP), which is associated with good prognosis15, was exclusively observed in the mitotic subtype (Fig.Ā 4). DNA methylation status other than G-CIMP also showed distinct patterns in the invasive and mitotic subtypes (Supplementary Fig.Ā S5). Differences in copy number alteration (CNA) and corresponding gene list were also demonstrated in Supplementary Fig.Ā S5 and Supplementary DataĀ S2.

Figure 4
figure 4

Genomic signatures of prognostic subtypes (TCGA-GBM). Distribution of somatic pathogenic variants of recurrently altered genes in GBM. Samples were separately clustered according to prognostic subtypes, and heat maps indicate prognosis score or G-CIMP status of each sample. Invasive or mitotic subtype samples with at least one pathogenic variant were included. Genes mentioned in the text are highlighted.

Markers of GBM prognostic subtypes

For clinical convenience, we also identified the genes that best differentiated between the invasive and mitotic subtypes, based on the intersection between PGs listed in TableĀ 1 and the DEGs identified in Fig.Ā 3a. Of the 48 genes identified, PDPN and TMEM100 showed the greatest differential expression between these two prognostic subtypes, except for genes encoding secreted proteins (Supplementary DataĀ S1). In all four independent datasets, PDPN showed significantly higher expression levels in the invasive subtype, whereas TMEM100 showed significantly greater expression in the mitotic subtype (Fig.Ā 5a). Moreover, expression levels of PDPN were significantly correlated with increasing glioma grade, whereas TMEM100 showed the opposite pattern (Fig.Ā 5b), suggesting that both of these markers are associated with prognosis, even when low-grade glioma samples were included. Immunohistochemistry (IHC) confirmed markedly higher PDPN levels in the invasive subtype and TMEM100 levels in the mitotic subtype (Fig.Ā 5c). These findings suggest that PDPN is a marker for the invasive subtype, and TMEM100 is a marker for the mitotic subtype.

Figure 5
figure 5

Expression of markers for prognostic subtypes. (a,b) Expression levels of PDPN and TMEM100 in each prognostic subtype (a) and in grades 2ā€“4 glioma (b). Differences among groups were compared by one-way ANOVA with Tukeyā€™s post hoc test for multiple comparisons; *Pā€‰<ā€‰0.05, **Pā€‰<ā€‰0.01, ***Pā€‰<ā€‰0.001 in (a) denote significant differences compared with the invasive group. (c) Expression levels of PDPN and TMEM100 were measured by IHC (brown). In all images, hematoxylin (blue) was used to counterstain nuclei (red scale barā€‰=ā€‰20ā€‰Ī¼m).

Discussion

Owing to enormous heterogeneity of tumors including GBM16, histologically defined tumors should be further divided into subgroups using molecule-level criteria. Categorizing GBM into subtypes may result in more precise treatment, enabling rational therapy based on subgroup-specific targets. Here, we show prognostic subtypes of GBM in terms of large-scale gene expression profiles. These novel GBM subtypes had differential biological phenotypes and multi-omics signatures, including differences in somatic pathogenic variants, DNA methylation, and CNA. The reproducibility of this classification was validated in four independent datasets, including one based on samples from our institution (Severance).

The importance of this study lies in that these prognostic subtypes are interrelated with distinct biological phenotypes, such as invasiveness. Migratory and invasive capabilities of tumor, along with mesenchymal transition and distant metastasis, are hallmarks of cancer associated with poor prognosis17,18. Of our prognostic subtypes of GBM, the poor (invasive) subtype showed significantly greater invasiveness than the favorable (mitotic) subtype, consistent with previous findings. Moreover, a methylated MGMT promoter region correlated significantly with the mitotic subtype, indicating that patients with this subtype were more likely to respond to temozolomide5. These prognostic subtypes are clinically relevant, resulting in patient stratification and enhancing integrative understanding of GBM. These results also suggest that therapeutic strategies should be based on prognostic subtypes; for example, patients with the mitotic subtype can be treated with temozolomide, whereas patients with the invasive subtype should receive therapeutic interventions targeting tumor invasiveness. In this regard, we will evaluate novel therapeutic strategy targeting invasiveness in the future study.

The prognostic subtypes identified here are in good agreement with previously reported GBM prognosis-associated signatures, including IDH1 pathogenic variant and G-CIMP. Although prognostic subtypes were based solely on transcriptome and OS, genomic signatures such as somatic pathogenic variants in IDH1 and ATRX, and G-CIMP, a DNA methylation signature, were exclusively present in the mitotic subtype. Because these signatures have been associated with favorable outcomes4,15, our subtypes reflect not only transcriptomic factors but also previously reported multi-omics markers in GBM patients. Interestingly, several genes in poor PG set were overlapped with previously reported G-CIMP transcriptome signature genes19. Studies are needed to determine whether this sample subset can be identified using both our OS-based subtypes and somatic pathogenic variant- or methylation-based subtypes. Moreover, our method was validated in TCGA RNA-seq data as well as in various microarray chips, implying its flexibility for multi-platform analyses.

Classification based on transcriptomes and OS has a distinct advantage, as in practice it is difficult to obtain all available multi-omics data, including transcriptome, genome, methylome, and DNA structural variation, from individual patients due to cost and overtreatment problems. Because our method uses only transcriptome and OS information, the prognostic subtypes we identified may be applicable clinically, as well as in research. Moreover, we identified single gene markers for each prognostic subtype, diminishing costs per patient, even though these results may not be as robust as those obtained from genome-wide expression levels. Future biomarker assays for GBM may include molecular tests for these prognostic subtypes.

Future studies are also required to assess the list of poor PGs (TableĀ 1). In addition to determining prognostic subtypes, this list contains genes not previously recognized as GBM-associated genes. Because their expression levels are indicative of GBM prognosis, inhibition of subsets of these genes may prolong OS. Although a mechanistic explanation of all these genes in relevance to GBM progression is beyond the scope of this study, they may have clinical use as prognostic biomarkers and novel drug targets, as well as suggesting new insights into GBM pathology and etiologies.

Methods

Public datasets

The primary sources of samples were the oncopression20, TCGA, and REMBRANDT21 databases. From oncopression (http://oncopression.com), preprocessed gene expression data using microarray were retrieved (normal brain, nā€‰=ā€‰723; grade 2 astrocytoma, nā€‰=ā€‰133; grade 3 astrocytoma, nā€‰=ā€‰132; GBM, nā€‰=ā€‰865) and survival information was obtained for 174 GBM patients. From TCGA, preprocessed multi-omics GBM datasets were obtained through cBioPortal22,23 (U133 microarray, nā€‰=ā€‰495; RNA-seq, nā€‰=ā€‰166; somatic pathogenic variant data from whole exome sequencing, nā€‰=ā€‰491; methylation, nā€‰=ā€‰254 for HM27 and nā€‰=ā€‰84 for HM450; CNA from GISTIC 2.0, nā€‰=ā€‰478) with survival information of 496 GBM patients. Secondary or recurrent GBM samples were excluded, and G-CIMP status was determined as described6. The REMBRANDT gene expression dataset (E-MTAB-3073) was obtained from ArrayExpress (grade 2 astrocytoma, nā€‰=ā€‰65; grade 3 astrocytoma, nā€‰=ā€‰58; GBM, nā€‰=ā€‰228) with survival information of 187 GBM patients.

Patient information (severance)

Samples were obtained from 52 non-recurrent GBM patients treated at Severance Hospital (TableĀ 2). To obtain gene expression profiles using microarrays, total RNA was extracted from each tissue sample using Qiagen RNeasy Plus Mini kits, and loaded onto Illumina HumanHT-12 v4 Expression BeadChip (Illumina, San Diego, CA, USA). The data were variance stabilizing transformed and quantile normalized using the R/Bioconductor lumi package24. MR images of patients were taken using Achieva 3.0T system (Philips Medical Systems, Best, Netherlands) within 7 days before the respective brain tumor removal. Axial images were planned parallel to the anterior and posterior limb of the corpus callosum. Because T1 contrast enhanced (CE) and T2 fluid-attenuated inversion recovery (FLAIR) images nearly indicate primary tumor core region and invasive front region, respectively, we quantified invasiveness according to the equation ā€“ the area occupied by (T2 FLAIR - T1 CE)/T1 CE, as previously suggested25. Among whole axial-axis MR images, sections having the largest tumor area were selected for quantification of invasiveness, and two authors (J.P. and S.-J.Y.) independently measured tumor area to minimize rater bias. Experiments in this study were approved by the institutional review board of Severance Hospital, Yonsei University College of Medicine (4-2012-0212, 4-2014-0649), and all participants provided written informed consent. All experiments were performed in accordance with relevant guidelines and regulations.

Table 2 Clinical characteristics of the samples in Severance dataset.

Isolation of GBM TSs and 3D invasion assay

TS-forming GBM cells were established from fresh GBM tissue specimens as previously described26. For TS culture27, cells were cultured in TS complete media, composed of DMEM/F-12 (Mediatech, Manassas, VA, USA), 1Ɨ B27 (Invitrogen, San Diego, CA, USA), 20ā€‰ng/mL bFGF, and 20ā€‰ng/mL EGF (Sigma-Aldrich, St. Louis, MO, USA). For 3D invasion assays27, each well of a 96-well plate was filled with mixed matrix composed of Matrigel, collagen type I (Corning Incorporated, Corning, NY, USA), and TS complete media. Single spheroids were seeded inside the matrix prior to gelation, followed by the addition of TS complete media over the gelled matrix to prevent drying. Invaded area was quantified as occupied area at (72ā€‰hā€“0ā€‰h)/0ā€‰h.

Selection of prognosis-associated genes (PGs) and calculation of prognosis score

PGs were defined as genes highly correlated with OS of GBM patients in both oncopression and TCGA datasets. Genes whose PCCs were negative in both datasets were considered poor genes; genes whose PCCs were positive in both datasets were considered as favorable genes; and genes whose PCCs had different signs in these two datasets were excluded because they were associated with poor prognosis in one dataset and favorable prognosis in the other dataset. The product of PCCs with sign from these two datasets (PCC score) was regarded as a quantification of robust correlation (Supplementary TableĀ S3). Because rare genes are significantly correlated with favorable prognosis, it is impractical to define PG sets larger than 40 genes for each prognostic subtype. After sorting according to this metric, therefore, 40 poor PGs and 40 favorable PGs were selected (TableĀ 1). Using these PGs, GBM expression profiles were applied to ssGSEA, and enrichment scores were standardized across all samples. Poor and favorable scores of each GBM sample were defined as this standardized score, and prognosis score is defined as (favorable score - poor score). To confirm that the number of genes in each PG set is appropriate, we also calculated prognosis scores using top 20ā€‰Ć—ā€‰2 genes rather than using 40ā€‰Ć—ā€‰2 genes. Correlation between prognosis scores obtained using these two gene sets were statistically significant (Pā€‰<ā€‰0.001 for all datasets), suggesting that the results were very similar (Supplementary Fig.Ā S2). If the outcomes are similar, larger gene sets can generate robust result across independent cohorts. Based on these results, we finally defined two PG sets including 40 genes each. Functional interactions among PGs were constructed as network maps using Cytoscape28 and Reactome FI29 plug-in.

Cutoff values of prognostic score and assignment of prognostic subtypes

Because TCGA-GBM dataset is not a validation set and has sufficient number of samples over various prognosis scores, we performed sensitivity analysis regarding several cutoff values for subtype assignment ranging from āˆ’2.0 to 2.0 using TCGA-GBM dataset. After dividing whole cohort into poor and favorable subgroups using specific cutoff value of prognosis score, we calculated P-values of log-rank test. This data shows that āˆ’1.0 indicate distinct local minimum P-value, suggesting āˆ’1.0 as one of good cutoff value (Supplementary Fig.Ā S2). To preclude samples with moderate prognosis scores from being assigned into poor or favorable subtypes, we introduced intermediate subtype between poor and favorable subtypes, indicating necessity of one more cutoff value. Cutoff value of 1.0 generated very low P-value, shaping plateau pattern at cutoff values larger than 1.0. Because too inclined cutoff values cause large difference in subtype size (Supplementary Fig.Ā S2), we finally selected āˆ’1.0 and 1.0 as cutoff values for assignment of prognostic subtypes.

Functional annotation of prognostic subtypes

Functional annotation to DEGs between invasive and mitotic subtypes was performed by ORA using gene sets obtained from MSigDB (KEGG and hallmark), QuickGO, and GO slim databases. Gene sets were manually categorized according to the functional similarity of terms. Statistical significance was determined using Fisherā€™s exact test, and enrichment scores were depicted as a heat map (GENE-E software). Additionally, ORA results with GO terms were visualized as an enrichment map using Cytoscape and ClueGO30 plug-in. Enriched GO terms were functionally categorized based on their kappa scores (>0.4). Statistical significance was determined using two-sided hypergeometric test, and only nodes with Bonferroni-adjusted P-valueā€‰<ā€‰0.001 were displayed.

IHC of marker proteins

Brain tissues from GBM patients were sliced into 5-Ī¼m-thick sections using a microtome, and then transferred onto adhesive slides. Antigen retrieval and antibody attachment were performed using an automated instrument (Discovery XT, Ventana Medical Systems, Tucson, AZ, USA). PDPN (Santa Cruz Biotechnology, Santa Cruz, CA, USA) and TMEM100 (OriGene, Rockville, MD, USA) were detected using a peroxidase/DAB staining system. All images were counterstained with hematoxylin.