Metabolic/Proteomic Signature Defines Two Glioblastoma Subtypes With Different Clinical Outcome

Glioblastoma (GBM) is one of the deadliest human cancers. Because of the extremely unfavorable prognosis of GBM, it is important to develop more effective diagnostic and therapeutic strategies based on biologically and clinically relevant subclassification systems. Analyzing a collection of seventeen patient-derived glioblastoma stem-like cells (GSCs) by gene expression profiling, NMR spectroscopy and signal transduction pathway activation, we identified two GSC clusters, one characterized by a pro-neural-like phenotype and the other showing a mesenchymal-like phenotype. Evaluating the levels of proteins differentially expressed by the two GSC clusters in the TCGA GBM sample collection, we found that SRC activation is associated with a GBM subgroup showing better prognosis whereas activation of RPS6, an effector of mTOR pathway, identifies a subgroup with a worse prognosis. The two clusters are also differentiated by NMR spectroscopy profiles suggesting a potential prognostic stratification based on metabolic evaluation. Our data show that the metabolic/proteomic profile of GSCs is informative of the genomic/proteomic GBM landscape, which differs among tumor subtypes and is associated with clinical outcome.

Scientific RepoRts | 6:21557 | DOI: 10.1038/srep21557 Growing evidences confirm that GBM contains a subpopulation of cells displaying stem-like properties reminiscent of normal stem cells, called tumor-initiating cells or GBM stem-like cells (GSCs) that are believed to play a fundamental role in tumor resistance to chemo-or radiotherapy and in recurrence 12 . GSCs can be isolated to generate cell lines characterized by self-renewing, multipotency, and highly tumorigenic ability and are reported to mirror both the genomic and the gene expression profiles of the original tumor more closely than conventional serum-cultured glioma cell lines 13,14 . Although the functional criteria defining GSCs are widely accepted, the molecular characteristics of these cells have not been fully identified 12 . As expected from the heterogeneous histology of GBM, there is extensive cellular heterogeneity within GSCs as well 11 . The complex interplay of signaling pathways and the lack of common molecular markers identifying GSCs further complicate the analysis of these cells.
To further dissect the molecular biology of GBM and seeking for appropriate clinical targets to be exploited for drug treatment, we analyzed our collection of patient-derived GSCs by combining complementary molecular approaches. Taking into account the most variable genes/transcripts, gene expression profiling of GSCs revealed two distinct clusters. These clusters closely overlapped those obtained both from metabolic analysis by NMR spectroscopy and from signal transduction pathway activation, as assessed by reverse-phase protein microarray technology (RPPA). In addition, we analyzed available RPPA data from TCGA to evaluate the capability of proteins differentially expressed in the two GSC clusters to predict patient survival.

Results
Gene expression profiling identifies two distinct phenotypes of GSC lines. Tumor samples from 55 GBM patients were mechanically dissociated and cultured in stem cell medium. Seventeen of the 55 GBMs (31%) generated GSC lines. There were no significant differences between GSC-generating tumors and those tumors that did not generate GSC cultures relative to patient age and sex, symptom duration, and tumor location. Notably, median overall and progression free survival were significantly shorter in tumors that generated GSC cultures as compared with those that did not 15 . GSC lines were expanded and previously validated for their stem cell properties, such as the ability to self-renew, to generate progeny of multiple lineages in differentiating culture conditions and to phenocopy the patient tumor in mouse xenografts. In addition they were characterized for the expression of stem cell markers, including CD133 and SOX2, and for the presence of tumor-specific genetic changes 15 .
Gene expression data were collected from all the GSC lines and unsupervised hierarchical clustering of the samples using the 1,000 most variable genes/transcripts (below and above the interquartile range of the full dataset) produced two distinct GSC clusters (Fig. 1A), reminiscent of the separation into GSf (full stem) and GSr (restricted stem) phenotypes previously reported by Günther and colleagues, and by Schulte and colleagues 16,17 . The GSf phenotype is characterized by proneural-like gene expression signature, growth as floating spheres in vitro, CD133 expression, and high invasiveness in vivo. Conversely, GSCs with a GSr phenotype show mesenchymal-like expression signature, no CD133 expression, in vitro adherent growth, and low invasive behavior in vivo 16 .
The 470 genes differentially regulated between our two GSC clusters ("GSC signature", Supplementary Table S1a) were then subjected to a Gene Set Enrichment Analysis (GSEA) 18,19 . Chromosomal loci 19p13, 19p12 and, to a somewhat lesser extent, 19q13 were highly enriched in the GSC signature (Supplementary Table S1b). All these regions contain a large number of Krüppel-Type Zinc Finger proteins 20 . Region 19q13 has been reported as hypermethylated in oropharyngeal squamous cell carcinoma 21 and copy-number-gain has been described in GBM 22 . The GSEA analysis also showed partial overlaps with mesenchymal, proneural and classical GBM sub-signatures established by Verhaak and colleagues 8 , an adult tissue stem-cell signature 23 , a stromal stem cell signature 24 and oligodendrocyte differentiation 25 (Supplementary Table S1c). Clustering of the GSC expression data using these partial gene lists (Fig. 1B) indicates a higher degree of "stemness" in the GSr-like lines coupled to a more mesenchymal-like phenotype. On the contrary GSf-like lines showed a more proneural/classical-like phenotype with reduced neural differentiation features and higher expression of regions 19p12 and 19p13. In line with this observation, proneural GBM has been recently proposed to arise from a non-stem-cell precursor, in contrast to the other three GBM subtypes 26,27 .
To further characterize our samples and compare them to previously described signatures, a large unified dataset was generated including data from both the Schulte GSf/GSr 17 and the Verhaak TCGA GBM studies 8 . Genes/probes from the different microarray platforms were integrated through a batch correction procedure using only two global sample types, stem-cell or bulk tumor, i.e. without explicitly assigning a GSf or GSr phenotype to any sample. In the unified dataset, the GSC and GSf/GSr samples reproduce two homogeneous clusters with GSf(-like) or GSr(-like) phenotype, respectively (Fig. 1C), thus confirming the initial phenotype assignment of the GSC clones. Extending the analysis to include also the four subtypes of TCGA GBM samples and clustering by the Verhaak GBM subtype signatures, the GSr/GSr-like and GSf/GSf-like samples distinctly co-clustered with either the mesenchymal (M) or proneural (P)/classical (C) tumor subtypes, respectively ( Supplementary Fig. S1a). Moreover, the overlap between the top 1000 most variable genes in GSf-vs GSr-like and the top 1000 most variable genes in GSf/GSr combined dataset, showed that 609 genes are shared between GSf-/GSr-like and GSf/GSr (Supplementary Fig. S1b and Supplementary Table S1d). Interestingly, RNAseq analysis of 447 cancer-related genes on 167 TCGA GBM samples, shows a less stringent division between known subgroups when compared to the complete microarray panel ( Supplementary Fig. S2).
(GABA), glutamate (Glu) and aspartate (Asp), which are major inhibitory or excitatory neurotransmitters in the brain; mobile lipids (MLs), often associated with cell death typical in brain malignancies 29 ; myoinositol (Myo-I), related with the grade of brain tumors 30 ; glutamine (Gln), related to the glial phenotype, and glutathione (GSH), an antioxidant regulator of ROS level 31 .
Clearly detectable signals of the astrocytic marker Myo-I and Gln were present in all GSC spectra, although with different signal intensities.
Unsupervised cluster analysis on the 17 GSC lines resulted in a dendrogram clearly separating the samples into two clusters (Fig. 2A). The first cluster, which is characterized by low lipids and high NAA and Gln, includes all GSf-like lines except the GSC line #112, whereas all GSr-like lines are grouped in the second cluster, characterized by high lipids, low Gln and absent NAA. NAA signals were clearly observed only in GSf-like lines, indicating a metabolic fingerprint typical of neurons (Fig. 2B). These signals were barely detectable in GSr-like spectra, thus suggesting that the neuronal metabolism, although present, is not prevalent in these lines. The intensity of MLs signal was higher in GSr-like line spectra compared to GSf-like ones (Fig. 2B,C), indicating a prevalent astrocyte/ glioma-like metabolism. These two clusters closely superimpose those defined by gene expression analysis with the only exception of GSC line #112 ( Fig. 2A), supporting an association of the GSf-like lines with the proneural subtype and of the GSr-like ones with the mesenchymal subtype. Phosphoproteomic analysis confirms differential metabolic profiles in GSf-like and GSr-like cells. Genomic and transcriptomic analyses cannot fully account for the translational and post-translational integration of the complex genomic and transcriptomic aberrations accumulated in each tumor [32][33][34] . We recently demonstrated that reverse-phase protein array (RPPA) analysis identifies pathway activation patterns that correlate with the response of GSCs to kinase inhibitors 35 . Therefore, we performed RPPA analysis on our collection of GSCs to measure the basal state levels of total and/or phosphorylated forms of diverse proteins involved in regulation of stemness, differentiation, migration, cell cycle, DNA damage, proliferation, cell growth, stress response and apoptosis (Supplementary Table S2). A comprehensive graphical representation and a list of the results of RPPA analysis on our GSC lines is available as Supplementary Fig. S3 and Supplementary Table S3, respectively. A pathway-oriented rather than single-endpoint approach to signaling analysis showed that GSf-like cells have a significant increase in SRC, Mitogen Activated Protein Kinase (MAPK), and Insulin-like Growth Factor-Receptor (IGF1-R/IR), whereas GSr-like lines displayed increased levels of phosphorylated proteins associated with the mammalian Target of Rapamycin (mTOR) pathway (Fig. 3). GSf-and GSr-like classification and clinical correlations. In a previous study, we demonstrated that the efficiency in establishing a GSC culture is about 30-40%, and that the capacity to generate GSCs represents an independent prognostic factor for poor patient survival 15 . Thus, we explored possible associations of the GSC subtypes, i.e. GSf-like and GSr-like, with clinical and pathological parameters of the donor patients. Symptom duration prior to diagnosis was significantly shorter, indicative of a more aggressive clinical behaviour, in tumors that generated GSr-like cultures as compared with tumors generating GSf-like cultures (p = 0.0422; Mann-Whithey U test). In addition, a trend towards a worse general status, as assessed by the Karnofsky Performance Status score (KPS), was evident for tumors generating GSr-like cultures (p = 0.0529; Mann-Whithey U test). The median overall survival of the donor patients was 10.5 and 8 months for GSf-like and GSr-like tumors, respectively. No significant differences were found between GSf-and GSr-like generating tumors in terms of patient age or sex, tumor location or diameter, and extent of tumor resection (Supplementary Table S4). Among the molecular variables, EGFRvIII expression was significantly more frequent in tumors generating GSr-like cultures than in those generating GSf-like cultures (p = 0.0345; Fisher exact test; Supplementary Table S5).

Analysis of GSf-and GSr-like molecular signatures in TCGA data. The most comprehensive
(phospho-)proteomic characterization of GBM has recently been done under TCGA initiative, by Brennan and colleagues using RPPA 9 . We explored such data, comprising 181 protein analytes on 251 GBM samples, and found that unsupervised clustering failed to produce a consistent partitioning of the sample cohort into the four clearly-defined gene expression subtypes described by Verhaak and colleagues 8 , but rather grouped them into two dominant GBM subtypes ( Supplementary Fig. S4). Thus, we asked whether the significantly different endpoints identified in GSf-and GSr-like GSCs by our RPPA analysis, could have prognostic value in the large cohort of GBM samples from TCGA. To this end, we used the Glioblastoma Bio Discovery Portal (GBM-BioDP, see methods section for details) and, as a query input, two gene symbol lists based on RPPA endpoints that were differentially expressed between GSf-and GSr-like GSCs (Supplementary Table S6 and methods section). Focusing on the proteins characterizing the GSf-like subgroup, we found that the levels of activated Human Epidermal Growth Factor Receptor 3 (HER3 pY1289) were higher in P versus M subtype (p = 0.00016, Fig. 4A) confirming a relationship between P and GSf subtypes. In addition, HER3 pY1289 was associated with longer overall survival in the global population (p = 0.01; Cox log rank test) ( Fig. 5A and Supplementary Table  S7). Phosphorylation of SRC at Y527 (SRC pY527) was significantly higher in P than in M GBM subtypes (p = 0.00031), whereas SRC pY416 was not different (Fig. 4A and Supplementary Table S7). Interestingly, despite a significant overall decrease in the levels of Focal Adhesion Kinase (FAK) in P versus M subtype (p = 0.00241), apparently challenging our RPPA data (Fig. 4A), FAK expression in TCGA data is significantly associated with poor prognosis only in P subtype (p = 0.0243; Cox log rank test, Supplementary Fig. S5a and Supplementary  Table S7).
After single-endpoint analyses, we exploited the built-in feature of the GBM-BioDP platform to generate a multi-gene prognostic index (mPI) from RPPA data, including all GSf-like endpoints as covariates in a single Cox regression model (Fig. 5B). The resulting mPI (mPI GSf ) was significantly associated with the outcome in both P and M subtypes (p = 0.000035 for the P subgroup and p = 0.000273 for the M subgroup; Cox log-rank test, Supplementary Fig. S6a), and also with the entire patient population (p = 0.000558; Cox log-rank test, Supplementary Fig. S6a) suggesting its potential unrestricted clinical relevance. Notably, levels of HER3 pY1289 and SRC resulted as significant covariates (HR = 0.25, p = 0.0175 and HR = 0.14, p = 0.0394, respectively; Cox log-rank test), while FAK expression did not reach statistical significance (HR = 1.55, p = 0.0896; Cox log-rank test, Fig. 5B). Therefore, we used these three endpoints to build up a simplified mPI GSf model which, despite a lack of power in predicting outcome in the M subtype, was capable of predicting overall survival in the overall population (p = 0.002786; Cox log-rank test) and in the P subtype (p = 0.000207; Cox log-rank test, Supplementary  Table S8 and Supplementary Fig. S7a).
Analyzing GBM TCGA patients using single GSr-like specific endpoints, we observed significantly higher levels of phosphorylated Ribosomal Protein S6 (RPS6 pS235-36) in M versus P (p = 0.00761, Fig. 4B). Survival analysis showed that high RPS6 pS235-36 had a negative impact on survival, specifically in the P subtype (p = 0.0428; Cox log-rank test, Supplementary Fig. S5b and Supplementary Table S7). Notably, high levels of total and phosphorylated Eukaryotic initiation factor 4E binding protein 1 (4EBP1 and 4EBP1 pS65) were significantly associated with a better prognosis in the overall population of TCGA patients (p = 0.0489 and p = 0.0206, respectively; Cox log-rank test, Fig. 5C). High levels of 4EBP1 pS65 and two distinct phosphorylated forms of Epidermal Growth Factor Receptor (EGF-R pY1173 and EGF-R pY992) correlated with a better prognosis only in the P subgroup, whereas high AKT pS473 was markedly associated with poor survival in the same subtype (Supplementary Fig. S5b and Supplementary Table S7).
Similarly to the GSf-like signature, to define a GSr-like mPI (mPI GSr ), we computed a single Cox regression model with all the GSr-like-specific endpoints as covariates (Fig. 5D). Interestingly, we found that levels of RPS6 pS235-36 inversely correlated with patient survival (HR = 9.12, p = 0.0082; Cox analysis, Fig. 5D) and that an increased overall survival was significantly predicted for all patients by the mPI GSr (p = 0.002019; Cox log-rank test, Supplementary Fig. S6b). Addition of age and MGMT status as covariates, significantly increased the capacity of the mPI GSr to predict a worst outcome (p = 0.001965; Cox log-rank test, Supplementary Fig. 7B). In particular, the mPI GSr HR for the P subtype were 5.7 (p = 0.000073; Cox log-rank test) and 13.43 (p = 0.026878; Cox log-rank test) without and with age and MGMT status inclusion, respectively (Supplementary Fig. S6b and Supplementary Fig. S7b). Altogether these results indicate that the activation of two GSf-like endpoints, HER3 and SRC, is associated with a GBM subgroup showing better prognosis whereas activation of RPS6, effector of mTOR pathway and GSr-like endpoint, identifies a subgroup with a worse prognosis.
Finally, based on the subset of GSr-and GSf-like RPPA endpoints which resulted significantly different among the two groups in previous analyses, we tested all possible binary combinations of such analytes that would better stratify TCGA patients into two groups with a significant survival difference. Notably, two specific expression patterns, i.e. i) elevated total SRC or SRC pY527 coupled to low RPS6 pS235-36 and ii) high RPS6 pS235-36 coupled to low total SRC or SRC pY527, allowed us to discriminate two groups of TCGA patients: group with pattern i) displays a significantly better outcome than group with pattern ii) (Fig. 6). These results confirm the potential prognostic role of SRC and RPS6 in a large cohort of GBM patients.

Discussion
Glioblastomas are genetically heterogeneous tumors, suggesting that a diverse set of gene products may act to regulate their behavior and ultimately their outcome. Molecular classification of GBM is still at the beginning, and there is no general consensus on GBM subtypes. Despite several studies have provided a high resolution picture of GBM molecular landscape and revealed major alterations that may drive disease pathogenesis and biology, dependencies of tumors on altered signaling pathways suitable for direct translation to the clinics have not been yet identified and exploited in this tumor. The availability of GBM stem-like cell (GSC) line models reflecting gene expression patterns and phenotypic characteristics of human GBMs 36 more closely than serum-cultured glioma cell lines, may help identifying cues for targeted therapies 14 . Although GSC cultures may represent a valuable tool for obtaining data to be translated onto the clinical setting, their translational power is still debated for at least two limiting factors. The first one is that, using our techniques, we arose GSC cultures from approximately one third of the tumors only. This implies that the GSC paradigm might not be applied in a substantial fraction of patients suffering from GBM but mainly to more aggressive cases, since median overall and progression free survival were significantly shorter in tumors that generated GSC cultures as compared with those that did not 15   GSCs are clonogenic 37 . Nonetheless, GSC cultures represent a valuable surrogate of their parental tumor and, although bearing intrinsic limitations, they allow molecular interrogation and study of one of deadliest tumors such as GBM.
In the attempt to find druggable signaling pathways in GBM, we analyzed a collection of 17 patient-derived GSCs by applying independent but complementary molecular approaches. Taking into account the potential bias of information obtained from our limited GSC samples, we compared our observations obtained on GSCs with datasets from large public databases annotated with clinical records. We were able to define two homogeneous subtypes resembling respectively the GSf and GSr groups described by Schulte, though with distinct molecular signatures.
Comparing genes differentially expressed between the GSf-like and the GSr-like GSC groups with gene sets from the literature, highly significant overlaps were present with two of the Verhaak GBM subtype signatures (mesenchymal and proneural), two stem-cell signatures 23,24 , and a gene set related to oligodendrocyte differentiation 25 . The GSf-like and GSr-like clusters are characterized by a "pro-neural-like" and a "mesenchymal-like" expression signature, respectively. However, it should be emphasized that, comparing the molecular signature of our GSCs with that of GBM TCGA, which reflects the bulk tumor, may represent a limiting factor of this analysis.  Table plots of hazard ratios (HR) for expression levels of individual RPPA analytes, selected as distinctive of GSf-(A) and GSr-like (B) groups. For each RPPA endpoint, a Cox regression model was applied using the intensity levels (z score) and the subtyping as covariates. In order to generate a multi-gene prognostic index (mPI), the full selection of either GSf-(C) or GSr-like (D) RPPA analytes were included as covariates in a Cox model and Assessing more deeply the genes/transcripts that are differentially expressed between our GSf-and GSr-like lines, we found that genes located in chromosomal loci 19p13, 19p12, and 19q13 were highly enriched in the "GSC signature". These regions contain a large number of Kruppel-type Zinc Finger transcription factors (ZNFs), which act mostly as chromatin modulating transcription repressor 20 . Several ZNFs have been identified as potential tumor suppressors and have often been found silenced in several tumor types. Deletions along chromosome 19 are common events in several malignancies including ovarian 38 cancer, esophageal squamous cell carcinoma 39 and oropharyngeal carcinoma 21 . The relevance of chromosome 19 alterations on the pathogenesis and prognosis of GBM deserves further evaluation.
To further investigate the signaling pathways differentially activated in GSf and GSr-like subgroups we performed RPPA analysis. We found that GSf-like lines displayed a significant increase in activation of SRC, MAPK and IGF1R/IR pathways. Conversely, GSr-like lines showed increased levels of phosphorylated proteins associated with the EGFR and PI3K/mTOR pathways.
We then evaluated the expression levels of either GSf-or GSr-like RPPA endpoints in the large cohort of GBM samples from TCGA 9 and compared them with the proneural and mesenchymal subtypes. Although our data on GSCs do not completely match previous RPPA analysis of TCGA GBM tumors, they confirmed the relevance of mTOR pathway in defining molecular subgroups of GBM.
A complete correlation between GSf-like and proneural or GSr-like and mesenchymal subtypes was not present for all common RPPA endpoints, probably due to the different complexity of the systems (i.e. GBM tumors versus GSC lines) however, some of the RPPA analytes shared between TCGA and our data, displayed significant correlations with the clinical outcome of GBM patients. In particular, the GSf-like signature is characterized by the presence of active HER3, previously associated with CD133 + GBM stem cells 40 and recently proposed as a potential target in GBM 32 . The GSf-like signature is also characterized by activation of proteins involved in the focal adhesion signaling axis, which is known to be driven by integrins, a hallmark of GBM stem cells 41 . The activation of HER3 and focal adhesion pathways confers a survival advantage to GBM patients in the TCGA cohort. This observation suggests that GSf-like GSCs might harbor phenotypical and molecular characteristics reminiscent of normal neural stem cells with migratory habit 42 . Conversely, the GSr-like signature is characterized by a strong activation of downstream targets of the EGFR and PI3K/mTOR pathway, which plays a critical role in cancer and, in particular, in GBM 43 . Interestingly, although the expression or activation of EGFR failed to correlate with the outcome in TCGA data, the presence of elevated levels of specific phosphorylation sites of both EGFR and RPS6, significantly associated with poor prognosis in the same cohort of GBM patients. Therefore, the GSr-like signature might be distinctive of a subset of metabolically active GSCs with high resistance to chemo-and radio-therapy hence negatively influencing the survival of GBM patients. Indeed, when classifying TCGA patients based on combined expression patterns of the two RPPA endpoints typical of either the GSfand GSr-like phenotype (i.e. SRC and RPS6, respectively), patients with GSr-like features display a significantly shorter overall survival. Despite the small sample size of our cohort, these results are in line with the clinical data of our patients producing lines of either GSf-or GSr-like groups.
Finally, the GSf-like and GSr-like clusters were clearly confirmed by NMR metabolic profiling. The GSf-like subtype, characterized by metabolites involved in the production of neurotransmitters such as NAA and GABA, is suggestive of a prevalent neuronal metabolism. The GSr-like subtype, characterized by lack of NAA and GABA and by high mobile lipids (MLs), is indicative of a prevalent astroglial-like metabolism. These metabolic profiles corroborate the association of the GSf-like group with the pro-neural GBM subtype resulting from independent gene expression analysis and with a different prognosis. In the clinical practice, NMR spectroscopy is mainly used to differentiate GBM from lower grade gliomas 44 . Hallmarks of GBM are high choline/NAA and choline/ creatinine ratios, and a lactate/lipid peak, which reflects tumor necrosis. Currently, NMR spectroscopy is not used to stratify GBM patients for prognostic purposes. In experimental settings, high glutamate signal has been correlated to IDH1 wild-type GBM 45 . In this study we showed that, at least in a subgroup of patients, the metabolic features of GBM cells, as assessed by NMR analysis, may be informative of the genomic/proteomic landscape of the tumor, and ultimately of the GBM subtype and clinical outcome. Finally, here we provide evidence that combined measurement of levels of two proteins, namely phospho-SRC and phospho-RPS6, may be informative on the outcome of GBM patients. Further investigation with increased sample size could confirm our observations providing important information on the pathogenesis of human GBM and eventually leading to an improvement of GBM diagnosis and to an indication for most appropriate therapeutic strategy.

Methods
Patients, diagnosis, and tumor characterization. Tumor tissue samples were collected from adult patients with GBM tumors (WHO grade IV) undergoing complete or partial surgical resection at the Institute of Neurosurgery, Catholic University School of Medicine in Rome. Informed consent was obtained from the patients before surgery. Experimental protocols were approved by the ethical committee of the Catholic University of Rome. Experiments were performed according to the guidelines approved by ethical committees of the Istituto Superiore di Sanità and Catholic University of Rome. Clinical and pathological features are summarized in Supplementary Table S3.
The expression of the proliferation marker Ki-67 and of Phosphatase and Tensin Homolog (PTEN) were characterized on tumor specimen by immunohistochemistry on deparaffinized sections using the avidin-biotin-peroxidase complex methods (ABC-Elite kit, Vector Laboratories), anti-Ki67 monoclonal antibody (MIB-1, Dako) and anti-PTEN mouse monoclonal antibody (clone 28H6; Novo Castra, Newcastle, United Kingdom). O6-methylguanine-DNA methyltransferase (MGMT) promoter methylation patterns were assessed on genomic DNA extracted from paraffin-embedded tissue by methylation-specific PCR as previously described 15 . Levels of VEGF and EGFRvIII were assessed as previously described 46 .
Establishing GSC cultures. GSCs were isolated through mechanical dissociation of the tumor tissue and cultured in a serum-free medium supplemented with epidermal growth factor and basic fibroblast growth factor as previously described 15 . Cell lines actively proliferating required 3 to 4 weeks to be established. In these conditions, cells grow as clusters of undifferentiated cells, as indicated by morphology and expression of stem cell markers such as CD133, SOX2, Musashi-1, and nestin. The in vivo tumorigenic potential of GBM neurospheres was assayed by intracranial or subcutaneous cell injection in immunocompromised mice. GBM neurospheres were able to generate a tumor identical to the human tumor in antigen expression and histological tissue organization.
Gene expression profiling. For GSC gene expression data collection, total RNA was extracted, labeled and hybridized to the Affymetrix GeneChip1.0ST array (Affymetrix, Santa Clara, CA, USA) according the manufacturer's instructions. Data preprocessing prior to the formal statistical analysis involved standard processes of normalization [robust Multi-array Average (RMA) method]. All data analysis was performed with R (http://www.R-project.org) using Bioconductor 47 . Differentially regulated genes were determined with LIMMA 48 applying default parameters and a FDR-corrected p value cutoff < 0.05.Generation of the unified dataset involved through two consecutive steps. After downloading the data from GEO (GSE23806) 17 , or the TCGA website (https://tcga-data.nci.nih.gov/docs/publications/gbm_exp/) 8 , equivalent "best" probes from the different microarray platforms (GSCs: Affymetrix Gene Chip 1.0S 17 , GSE23806: Affymetrix HG-U133 Plus 2.0 8 , data: LBL202 dataset Affymetrix HuExon) were selected based on the Affymetrix annotation files from the NetAffx website. The combined dataset comprised 15697 probes/genes. In a second step the ComBat batch correction algorithm 49 was used to combine the three different datasets. GSC samples and GSr/GSf samples from GSE23806 were all assigned the same SC qualifier as the first covariate variable. Likewise GBM tumor samples from TCGA and GSE23806 were all assigned a TUMOR qualifier. No explicit assignment of a GSf or GSr phenotype for either GSC or GSE23806 samples was performed during the ComBat normalization procedure. All clustering and heatmap calculation was performed using heatmap.2 of the 'R' package "gplots". For the clustering of the full unified dataset including the TCGA tumor samples the top 100 genes for each tumor subset assignment were chosen from the full 840 gene signature (file TCGA_unified_CORE_ClaNC840.txt at https:// tcga-data.nci.nih.gov/docs/publications/gbm_exp/). Gene set enrichment analysis was based on MSigDB using the GSEA online tool 18,19 hosted by the Broad Institute (http://www.broadinstitute.org/gsea/index.jsp). Overlap of genes differentially expressed between the two GSC sample groups and the Verhaak and colleagues GBM subtype signatures was determined manually in R because a detailed examination of the lists present in MSigDB (VERHAAK_GLIOBLASTOMA_*) revealed a large discordance with the gene/GBM-subtype association present Scientific RepoRts | 6:21557 | DOI: 10.1038/srep21557 in the original file on the TCGA website (https://tcga-data.nci.nih.gov/docs/publications/gbm_exp/ TCGA_uni-fied_CORE_ClaNC840.txt).
1H NMR spectroscopy. GSC 1D 1H NMR spectra were analyzed as previously described 28 . Briefly, cells were washed in PBS, centrifuged and suspended in PBS with 20% D2O and 2 mM Sodium 3-(TriMethyl-Silyl) Propionate 2,2,3,3-d (TMSP) for a frequency standard. Cell suspension were inserted in a 1 mm NMR tube and centrifuged to obtain a final packed cell sample. NMR reagents were purchased from Cambridge Isotope Laboratories, Inc. (CIL Andover MA,USA).
1D and 2D COSY 1H NMR spectra were run at 400.14 MHz on a digital Avance spectrometer equipped with a 1 mm microprobe (Bruker, AG, Darmstadt, Germany). Both 1D and 2D COSY spectra were acquired at T = 298 K. Water suppression in 1D and 2D 1H experiments was obtained using pre-saturation. The measurement of cell samples lasted approximately 210 min (60 min for the 1D experiment and 150 min for the 2D experiment). NMR spectra were characterized by line widths in the range 9-15 Hz, which is typical of metabolite signals in intact cells and in the in vivo brain 50 . Chemical shifts were measured in cells with respect to Lactate methyl signal (Lac) at 1.33 ppm in 1D spectra and to Lac cross peak at 1.33-4.12 ppm in 2D COSY spectra. All NMR parameters were obtained in at least three independent experiments.
Metabolite signal assignments were performed according to indications from literature and by comparison with pure compounds.
1D peak deconvolution and integration as well as 2D COSY cross peak integration were performed by the WINNMR software (Bruker).
Individual integral values were normalized using the methyl group of cytosolic polypeptides at 0.94 ppm as internal reference for 1D spectra while for 2D spectra signal integrals were normalized to the intensity of the lysine cross-peak at 1.70-3.00 ppm.
Reverse-phase protein microarrays. RPPAs were performed as previously described 35  RPPA data analysis was performed on standardized data using dedicated packages of the 'R' software (http:// www.r-project.org/, packages: Bioconductor, Heatplus, ggplot2, plyr, gplots, coin). Briefly, after local background correction, signal for secondary antibody staining alone was subtracted to raw antibody intensities for all spots and further normalization over the corresponding total protein staining allowed correction of potential loading biases. Standardized data were generated individually for each antibodies by subtracting the sample mean and dividing by the standard deviation (z scores). Wilcoxon rank sum test was used to compare RPPA data generated on our GSC lines while unpaired t test was used for statistical analysis of expression levels between P and M subtypes in TCGA level 3 RPPA data downloaded from http://gbm-biodp.nci.nih.gov. Statistical significance was accepted for p values lower than 0.05. TCGA database bioinformatic analyses. In order to investigate GBM RNAseq and RPPA data from the TCGA project, we exploited 'Next-Generation Clustered Heatmaps' available at the Pan-Cancer NG-CHM Compendium website (http://bioinformatics.mdanderson.org/TCGA/NGCHMPortal/), hosted by the bioinformatics department of MD Anderson, University of Texas (Houston, TX, USA). Most of our results for the mRNA expression refer to the 3-Platform Aggregates used by Verhaak and colleagues 8 . To study the relationships between RPPA endpoints associated with GSf-and GSr-like clusters and patient outcome in TCGA data we used the GBM-BioDP software platform 51 (http://gbm-biodp.nci.nih.gov). Briefly, based on (phospho-)proteins that resulted significantly modulated between GSr-and GSf-like GSC clusters in our RPPA analysis, we generated two lists of corresponding gene symbols (Supplementary Table S6) and we queried the GBM-BioDP web platform. For further analysis we selected only those RPPA endpoints shared between our analyses on GSCs and that performed in the TCGA, either as total, phosphorylated or both. Thus, we evaluated whether the levels in P and M samples were in line or not with the RPPA levels measured in GSf-like versus GSr-like in our GSC samples (Supplementary Table S6). Methods used to compare RPPA levels between the P and M subtypes are described in the previous paragraph. The prognostic index for each patients was calculated as the sum of the products of the linear components of the Cox model and the corresponding RPPA values of the endpoints included as covariates in the full model (Fig. 5B,D). Survival analysis in Fig. 6 was performed on TCGA data downloaded directly via 'TCGA Data Matrix' (http://tcga-data.nci.nih.gov/tcga/dataAccessMatrix.htm) and using as event variable either 'days to death' for deceased patients or the maximum follow-up time for all other patients. Briefly, stratification of TCGA patients in either the GSf-or the GSr-like group was done by applying RPPA expression cut-offs for selected endpoint combinations. To this end we decided to use a subset of RPPA analytes differentially expressed between P and M subtypes or statistically significant covariates (p < 0.05) in single or combined Cox model done either on the full cohort of patients or on P and M subtypes, i.e. EGFR pY1173, RPS6 pS235-36, HER3 pY1289, SRC and SRC pY527. Based on this subset of RPPA endpoints, we tested all possible binary combinations of GSfand GSr-like analytes using median expression level cut-offs, e.g. patients having expression values above the median for either SRC or SRC pY527 or HER3 pY1289 and simultaneous lower than median levels of either RPS6 pS235-36 or EGFR pY1173 were assigned the GSF-like group while patients with opposite pattern were considered as GSr-like. Survival and prognostic index analyses have been performed by means of the survival package of the 'R' software. The results shown in the present manuscript are in part based upon data generated by the TCGA Research Network (http://cancergenome.nih.gov/).