Ventricular-Subventricular Zone Contact by Glioblastoma is Not Associated with Molecular Signatures in Bulk Tumor Data

Whether patients with glioblastoma that contacts the ventricular-subventricular zone stem cell niche (VSVZ + GBM) have a distinct survival profile from VSVZ − GBM patients independent of other known predictors or molecular profiles is unclear. Using multivariate Cox analysis to adjust survival for widely-accepted predictors, hazard ratios (HRs) for overall (OS) and progression free (PFS) survival between VSVZ + GBM and VSVZ − GBM patients were calculated in 170 single-institution patients and 254 patients included in both The Cancer Genome (TCGA) and Imaging (TCIA) atlases. An adjusted, multivariable analysis revealed that VSVZ contact was independently associated with decreased survival in both datasets. TCGA molecular data analyses revealed that VSVZ contact by GBM was independent of mutational, DNA methylation, gene expression, and protein expression signatures in the bulk tumor. Therefore, while survival of GBM patients is independently stratified by VSVZ contact, with VSVZ + GBM patients displaying a poor prognosis, the VSVZ + GBMs do not possess a distinct molecular signature at the bulk sample level. Focused examination of the interplay between the VSVZ microenvironment and subsets of GBM cells proximal to this region is warranted.


TABLES (in
. Demographic, clinical, and molecular information of 254 glioblastoma patients in The Cancer Imaging Archive and The Cancer Genome Atlas. Table S2. Molecular datasets from The Cancer Genome Atlas used together with the number of samples and features analyzed in each dataset. Table S3. Univariate Cox overall survival analyses of VUMC and TCIA/TCGA-GBM datasets. Table S4. List of gene mutations in ventricular-subventricular zone-contacting GBMs (VSVZ+GBMs) and VSVZ-GBMs. Table S5. List of copy number variations in ventricular-subventricular zone-contacting GBMs (VSVZ+GBMs) and VSVZ-GBMs. Table S6. List of genes whose expression varied between ventricular-subventricular zonecontacting GBMs (VSVZ+GBMs) and VSVZ-GBMs. Table S7. List of proteins whose expression by RPPA varied between ventricular-subventricular zone-contacting GBMs (VSVZ+GBMs) and VSVZ-GBMs. Table S8. List of methylation sites beta-values differed between ventricular-subventricular zonecontacting GBMs (VSVZ+GBMs) and VSVZ-GBMs .  Table S9. Comparisons of sets of co-expressed genes obtained through a weighted gene coexpression network analysis in ventricular-subventricular zone-contacting glioblastomas (VSVZ+GBMs) and VSVZ-GBMs as well as in GBMs classified by their canonical transcriptional signature. Table S10. Overlap of clusters obtained through unsupervised consensus clustering with glioblastomas classified by their ventricular-subventricular zone contact (VSVZ+GBM vs. VSVZ-GBMs) and known canonical transcriptional subtypes. Table S11. List of gene mutations in ventricular-subventricular zone-contacting GBMs (VSVZ+GBMs) and VSVZ-GBMs that are G-CIMP negative and IDH wild-type. Table S12. List of copy number variations in ventricular-subventricular zone-contacting GBMs (VSVZ+GBMs) and VSVZ-GBMs that are G-CIMP negative and IDH wild-type. Table S13. List of genes whose expression varied between ventricular-subventricular zonecontacting GBMs (VSVZ+GBMs) and VSVZ-GBMs that are G-CIMP negative and IDH wild-type. Table S14. List of proteins whose expression by RPPA varied between ventricularsubventricular zone-contacting GBMs (VSVZ+GBMs) and VSVZ-GBMs that are G-CIMP negative and IDH wild-type. Table S15. List of methylation sites beta-values differed between ventricular-subventricular zone-contacting GBMs (VSVZ+GBMs) and VSVZ-GBMs that are G-CIMP negative and IDH wild-type. Table S16. Comparisons of sets of co-expressed genes obtained through a weighted gene coexpression network analysis in ventricular-subventricular zone-contacting glioblastomas (VSVZ+GBMs) and VSVZ-GBMs that are G-CIMP negative and IDH wild-type as well as in GBMs classified by their canonical transcriptional signature. Table S17. Overlap of clusters obtained through unsupervised consensus clustering with glioblastomas classified by their ventricular-subventricular zone contact (VSVZ+GBM vs. VSVZ-GBMs) that are G-CIMP negative and IDH wild-type and known canonical transcriptional subtypes.  Therefore, the sum of the probabilities of the two dots per sample is 1. C is an example of good predictions made by an artificial model of PLS-LR. Here, samples are represented on the x-axis separated by a vertical dotted line into their binary classification of "0" and "1". The model assigns two probabilities (two dots) to each sample of being 0 or 1. As seen here, the artificial model successfully predicts all the known 0 (red) samples with a high probability of being 0 and a low probability of being 1; and conversely, all the known 1 (green) samples with a high probability of being 1 and a low probability of being 0. D, E, and F represent two-dimensional projection of the high-dimensional gene expression, protein expression, and methylation datasets, respectively, using the t-SNE algorithm (ran with perplexity=30.0, iterations=1000). Application of PLS-LR models (linear) or t-SNE (nonlinear multidimension reduction) on molecular datasets does not distinguish VSVZ+GBMs from VSVZ-GBMs.

SUPPLEMENTAL METHODS Computational Analyses of Molecular Datasets
In addition to single gene expression analysis, a weighted gene co-expression network analysis (WGCNA) was performed using its R package (version 1.49), 6 where profile of each gene coexpression module is represented by an eigengene, which is defined as the first principal component of genes within that module. Weak or spurious correlations in the WGCNA analyses were penalized by powers of 7 to 12 such that the resulting networks had scale-free topology (data not shown, available at request). WGCNA modules were computed using the dynamic tree cut algorithm instead of cutting the tree branches at a constant height due to its superior performance. 7 Module eigengene analysis was performed using standard methods. 6 To detect linear combinations of gene methylations, gene expressions, and protein expressions that could be predictive of VSVZ+GBMs and VSVZ-GBMs, partial least squares followed by logistic regression (PLS-LR)-a technique highly suited for high-dimensional genomic data 8was conducted using the Classification for MicroArrays (CMA version 1.34.0) R package. 9 Monte-Carlo cross validation was used to develop training and testing datasets to avoid artificially overfitting the PLS-LR classifier with two-thirds of the samples being retained for the training dataset and with 100 iterations.
To reveal whether VSVZ+GBMs and VSVZ-GBMs segregate in high-dimensional space, nonlinear dimensionality reduction of the multi-dimensional gene methylation, gene expression, and protein expression datasets was performed using t-SNE. 10 t-SNE embedding was calculated using the Barnes-Hut implementation, available from its author Laurens van der Maaten (https://github.com/lvdmaaten/bhtsne). Linear dimensionality reduction was performed first using principal component analysis; then, nonlinear dimensionality reduction was performed using t-SNE on the top 30 latent principal components.
To reveal whether gene and protein expression clustered based on VSVZ contact in GBMs, clustering was performed in these datasets using the ConsensusClusterPlus (v1.40.0) 11 R package. Consensus indices were calculated using k-means clustering and Pearson correlation, and final consensus clusters were assembled by hierarchical clustering (average linkage). The cumulative distribution function (CDF) plots aided in finding a stable number of clusters. Empirically, the lowest number of clusters, where the CDF plot was the most stable through all the consensus indices, was chosen for analysis (these were 2 and 4 for gene and protein expression, respectively). CDF plots are available upon request. Consensus clusters were compared with VSVZ status of GBMs using the chi-square (Χ 2 ) test of independence.
Except for PLS-LR (used for binary classifications) and nonlinear dimensionality reduction with t-SNE, all other computational methods employed were tested to see whether their results significantly correlated with the known transcriptional GBM subtype classification assigned to each of the samples in the TCIA/TCGA-GBM cohort (neural, proneural, classic, mesenchymal, and G-CIMP). 12 Samples without a designated classification (22 total) were excluded from statistical testing. The R and Python codes used to conduct these analyses are available upon request.