Cell-lineage controlled epigenetic regulation in glioblastoma stem cells determines functionally distinct subgroups and predicts patient survival

There is ample support for developmental regulation of glioblastoma stem cells. To examine how cell lineage controls glioblastoma stem cell function, we present a cross-species epigenome analysis of mouse and human glioblastoma stem cells. We analyze and compare the chromatin-accessibility landscape of nine mouse glioblastoma stem cell cultures of three defined origins and 60 patient-derived glioblastoma stem cell cultures by assay for transposase-accessible chromatin using sequencing. This separates the mouse cultures according to cell of origin and identifies three human glioblastoma stem cell clusters that show overlapping characteristics with each of the mouse groups, and a distribution along an axis of proneural to mesenchymal phenotypes. The epigenetic-based human glioblastoma stem cell clusters display distinct functional properties and can separate patient survival. Cross-species analyses reveals conserved epigenetic regulation of mouse and human glioblastoma stem cells. We conclude that epigenetic control of glioblastoma stem cells primarily is dictated by developmental origin which impacts clinically relevant glioblastoma stem cell properties and patient survival.

G lioblastoma (GBM) is one of the most aggressive cancers and the most frequent and lethal primary malignant brain tumor 1 . Standard therapy of care includes maximal-safe surgical resection, concomitant chemo-and radiotherapy, and adjuvant chemotherapy, yet the 2-year survival is 18.5% 1 . Treatment resistance is explained by extensive genetic and epigenetic tumor cell heterogeneity of GBM, both with regard to intertumor heterogeneity [2][3][4] and intratumor heterogeneity at different regions 5 and in individual cells [6][7][8][9] . Large efforts have been done to converge GBM heterogeneity into biologically and clinically relevant subgroups of GBM. Transcriptome-based stratifications have produced three major isocitrate dehydrogenases 1 and 2 (IDH1 and IDH2) wild-type (wt) GBM subtypes, also called The Cancer Genome Atlas (TCGA) subtypes: proneural (PN), classical (CL), and mesenchymal (MS) 2,10,11 . Studies of patient-derived GSC cultures, clonal derivatives, and single cells have shown the presence of a PN to MS differentiation axis with plasticity of the states 8,12,13 , and a comprehensive GBM single-cell analysis has uncovered additional and dynamic cellular states in GBM tumors 7 . The GBM epigenome has been most frequently analyzed by DNA methylation profiling 3,4,14 and methylomes have proven prognostically more useful than transcriptomes to predict patient survival 3,4 , demonstrating the importance of understanding epigenetic regulation in GBM. The active chromatin landscape of GBM has been investigated with chromatin immunoprecipitation sequencing (ChIP-seq) of acetylated lysine 27 on histone H3 (H3K27ac) in a collection of primary tumors and GSC cultures 15 , and by the assay for transposase-accessible chromatin using sequencing (ATAC-seq) 8,16,17 , which have uncovered subgroups of GBM suggested to be regulated by different sets of transcription factors (TFs).
Several studies have implied a connection between GBM molecular subgroups and developmental origin 2,18 . Methylation profiling has proven particularly useful to connect primary tumors with their tissue of origin 19 and has been used to separate GBM with higher resolution than gene expression 4,20 . We have shown by experimental modeling of GBM that developmental state and age of the cell of origin could affect its vulnerability to GBM development 21 , and that it shaped the phenotype of the resulting GBM stem cells (GSCs) 21,22 . Tumors were induced by the same oncogenic events in three different mouse cell lineages which produced contrasting tumor cell phenotypes with regard to malignancy and drug sensitivity, where a more differentiated origin promoted a less tumorigenic but more drug resistant mouse GSC (mGSC) phenotype 22 . Through a cross-species GSCbased stratification approach applying the mouse cell of origin (MCO) gene signature of differentially expressed genes on a large collection of human GSC (hGSC) cultures, we found that developmental origin could be used to stratify functionally distinct groups of patient-derived GSC cultures 22 . A recent similar crossspecies approach has further corroborated the importance of cell lineage origin in GBM 23 . In all this has demonstrated that intertumor heterogeneity to a large extent is shaped by the intrinsic properties of the GBM cell of origin which result in highly dynamic GSCs that basically evade all current therapies.
Here we show a cross-species epigenome analysis where the chromatin accessibility landscape of 9 mGSC cultures of defined developmental origin and 60 IDH wt hGSC cultures are analyzed with high-sensitivity ATAC-seq 24 . We relate the results to a range of molecular and functional data and show that genome-wide chromatin accessibility separates both mouse and human GSC cultures into three functionally distinct subgroups. The mGSC groups are divided by developmental origin and display shared molecular features with each of the three hGSC groups along a PN to MS axis. Cross-species analyses support the hGSC subgroups to be cell lineage controlled and show a conservation of enriched TF motifs in the differential chromatin-accessible regions. Importantly, the ATAC-seq-based stratification can separate patients with significantly different survival pointing to the ability of this analysis to distinguish clinically relevant tumor cell properties.

Results
Chromatin accessibility in mouse GSCs predicts developmental origin. We performed ATAC-seq analysis of nine previously established mGSC cultures (mGC1 GFAP : SC81, SC83, SC84; mGC2 NES : SC50, SC52, SC64; mGC3 CNP : SC37, SC74, SC112) derived from GBMs induced by the same oncogene (PDGFB) in three different cell lineages in adult tv-a transgenic mice (Gfap/tva;Arf−/−, Nes/tv-a;Arf−/−, and Cnp/tv-a;Arf−/−, respectively) 22 ( Fig. 1a and Supplementary Data 1). The cell of origin in these mice had previously been deduced to be a neural stem cell (NSC)like cell in G/tv-a mice, an astrocyte precursor cell-like cell in N/tva mice and an oligodendrocyte precursor cell (OPC)-like cell in C/ tv-a mice 22 , and cell cultures of different origin had displayed distinct functional, phenotypic, and transcriptomic properties 22 . The ATAC-seq data from all samples were determined to be of high quality based on analyses of enrichment of sequence reads at transcription start sites (TSS) (Fig. 1b), fraction of reads in peaks (FRiP) (Fig. 1c) and reproducibility among the technical replicates ( Supplementary Fig. 1a). Genomic annotation of ATAC-seq data displayed some variation across cultures, but showed high reproducibility between replicates ( Supplementary Fig. 1b). In all this supported a high quality of the mouse ATAC-seq data.
Previous principal component analysis (PCA) of gene expression array data from the same mGSC cultures had shown a clear separation based on developmental origin 22 . PCA analysis of global ATAC-seq data also distinguished cell of origin groups and produced a clear separation between NSC and GSC cultures (Fig. 1d). To understand the underlying molecular regulation of the mGSC groups we extracted the differentially enriched ATACseq peaks (Log 2 (Fold change (FC)) > 1, false discovery rate (FDR) < 0.05) of each group ( Fig. 1e and Supplementary Data 2), which produced 819 peaks for mGC1 GFAP , 1161 peaks for mGC2 NES and 95 peaks for mGC3 CNP . The differential ATAC peaks were annotated to their nearest genes (Supplementary Data 2), where some examples are indicated for each of the mGSC groups (Fig. 1e). Representative genome tracks for each group validated the differential chromatin accessibility of annotated genes (Fig. 1f). Many genes of the mGC2 NES group showed a clear mesenchymal character (e.g., Cd44, Bmp7, Tgfb2) which is in line with the previous finding that mGC2 NES cells were most closely related to the MS GBM subtype, while both the mGC1 GFAP and mGC3 CNP cells were most similar to PN GBM 22 . To widen our understanding of the characteristics of each mGSC group we performed Gene ontology (GO) enrichment analysis of the annotated genes ( Supplementary Fig. 1c-f). The top-15 GO terms selected by strength and ranked by FDR were for mGC1 GFAP mainly related to voltage gated ion channels and RAS GTPase activity for molecular function ( Supplementary  Fig. 1c), and to synaptic functions and adult responses to neuronal stimuli for biological processes ( Supplementary Fig. 1d). For mGC2 NES the majority of molecular functions were involved in protein binding and RNA polymerase II DNA binding ( Supplementary Fig. 1e) and for biological processes the GO terms were centered around epithelial and mesenchymal cells ( Supplementary Fig. 1f). For mGC3 CNP the number of genes were small and there were no significant GO results. Taken together, also these results supported a mesenchymal character of the mGC2 cells and a neural character of mGC1 cells.
To discover the differential regulatory motifs between cultures of different origin we performed TF motif analysis of the unique accessible chromatin regions and extracted the unique TFs among the mGSC groups ( Fig. 1g and Supplementary Data 3). In line with having the highest number of unique ATAC regions mGC2 NES samples also produced the largest number of uniquely enriched TF motifs. Overall, the result suggested different regulatory circuits between the mGSC groups, which was further sustained when extracting all significantly enriched and variable TF motifs of each group ( Supplementary Fig. 1g-i). These were highly overlapping with the unique TF motifs (Fig. 1g) Fig. 1 The chromatin accessibility landscape of mouse GSC cultures can predict cell of origin. a Overview of mouse GBM models and GSC cultures of different origin that were analyzed by ATAC-seq. Schematic figures were produced by the authors. b Enrichment of ATAC-seq reads at transcription start sites (TSS) from the mGSC cultures. produced a clear overlap between mGC1 GFAP and mGC3 CNP , and mGC3 CNP and mGC2 NES , while clearly separating mGC1 GFAP and mGC2 NES .
In summary, the ATAC-seq analysis of mGSC cultures produced, in accordance with our previous gene expression analysis, distinct cell of origin groups. Analysis of TF motifs in differential accessible chromatin supported distinct differences between mGC1 GFAP and mGC2 NES while there was some overlap between mGC3 CNP and mGC1 GFAP or mGC2 NES . This proposed that although all mGSC cultures have the same driver mutations (PDGFB overexpression and p19 Arf deletion) there are important developmentally inherited mechanisms that regulate their previously shown 22 different phenotypes.
Heterogeneous chromatin accessibility across 60 patientderived GSC cultures. Next, we investigated the chromatin accessibility landscape in GSC cultures of our local human GBM cell culture (HGCC) biobank 22,25,26 . We performed ATAC-seq on 60 patient-derived IDH wt GSC cultures ( Fig. 2a and Supplementary Data 1). We applied established and stringent criteria for the ATAC-seq data processing and could show high Pearson correlation coefficient (0.8-0.98) for technical replicates, TSS enrichment scores above 3.8, and FRiP of all duplicate samples at or above 20% 27 ( Supplementary Fig. 2a-c). We performed a saturation analysis using random sampling non-linear regression to analyze the fraction of all predicted accessible chromatin regions that we could expect to detect with 60 cultures (Fig. 2b). This showed that our sample size was large enough and that all predicted regions of accessible chromatin would be detected with 47 cultures. In total, we had captured 323526 ATAC peaks from our hGSC cohort. To obtain an overview of the global chromatin accessibility landscape of the samples we identified all unique chromatin-accessible regions in the entire data and calculated for each region the number of samples it was present in (Fig. 2c). A large proportion (25.4%) of ATAC peaks was only detected in one hGSC culture, and just 1.5% of the accessible regions were common to all 60 cultures. Nonetheless, unique chromatinregions showed increased openness the more frequently they were present across the samples ( Supplementary Fig. 2d). SOX2 showed an overall high chromatin accessibility across the cohort (Fig. 2d) in line with previous data showing SOX2 expression in all HGCC cultures investigated 28 . Yet, individual cultures showed a clear variability in chromatin openness of this locus (Fig. 2d). Inter-culture heterogeneity was further sustained by separate analyses of promoter (−1 kbp to +100 bp of TSS) and nonpromoter (also called distal regulatory element (DRE)) regions of GSC metamodule genes 7 and MCO genes 22 (Fig. 2e and Supplementary Fig. 2e). Also structural genomic annotation of the ATAC data showed a clear variation in chromatin-openness among the 60 cultures (Fig. 2f). Taken together, this demonstrated that our cohort of 60 patient-derived GSC cultures displayed a highly heterogeneous chromatin accessibility landscape in line with the high GBM inter-patient diversity.
Chromatin-accessibility robustly identifies three clusters of patient-derived GSC cultures. To identify unifying features of the hGSC cohort we performed non-negative matrix factorization (NMF) analysis on the ATAC-seq data, which produced three clusters: ATAC60-C1 (n = 22), ATAC60-C2 (n = 16), and ATAC60-C3 (n = 22) ( Supplementary Fig. 3a, b and Supplementary Data 1). A large part of the HGCC cultures had previously been classified based on gene expression according to the TCGA subtypes 28 and with the MCO gene signature (MCO1-3) 22 . To compare the ATAC-seq clusters with the TCGA and MCO classifications we excluded hGSC samples lacking such information and re-analyzed 50 samples with NMF. This produced, again, three clusters: ATAC50 C1 (n = 19), ATAC50 C2 (n = 14), and ATAC50 C3 (n = 17) ( Fig. 3a and Supplementary  Fig. 3c). Comparing ATAC50 to ATAC60 clusters showed that only three samples had changed cluster in ATAC50 (Supplementary Fig. 3d and Supplementary Data 1), which showed a robustness of the chromatin accessibility-based clustering. From hereon we focus mainly on the ATAC50 classification.
When we compared the ATAC50 clusters with the TCGA subtypes there was little overlap (Fig. 3b). The majority of PN and CL cultures were in C1 while MS cultures basically were divided between C2 and C3. Comparing to MCO showed a higher degree of overlap (Fig. 3c), likely reflecting the relation between developmental origin and epigenetic state of GSC.
Next, we extracted the unique chromatin-accessible regions for each ATAC50 cluster with DESeq2 (Log 2 (FC) > 1, FDR < 0.01, peak average intensity >30, and coefficient of variance <0.2; Fig. 3d). By this we identified 4023 regions in C1, 5547 in C2, and 949 in C3 ( Fig. 3d and Supplementary Data 4). The genomic features of all chromatin-accessible regions from each ATAC50 cluster were annotated using the chromatin state discovery and characterization software (ChromHMM) 29 (Fig. 3e and Supplementary Fig. 3e). There was a diverse distribution of chromatin states with some marked differences between ATAC50 clusters. C1 had the largest proportion of active promoter regions (H3K4me3 and H3K27ac), C2 occupied a higher proportion of active regions (H3K27ac), and C3 had a higher frequency of weak enhancer regions (H3K4me1). Common to all three clusters was that the combined proportion of non-promoter regions (strong and weak enhancer regions) constituted the biggest proportion of chromatin states, clearly larger compared to the distribution in the whole ATAC-seq data (Fig. 3e). This indicated, as for mGSCs, that DRE regions were central in defining the ATAC50 clusters. To test our hypothesis, we performed Pearson correlation hierarchal clustering of all ATAC peaks, of DRE regions only, and of promoter regions only ( Supplementary Fig. 3f). All ATAC peaks and DRE peaks displayed a similar dynamic range of chromatin openness and cluster patterns, while the dynamic range of promoter regions was smaller which supported our assumption. NMF clustering of DRE ATAC peaks ( Fig. 3f and Supplementary Fig. 3g) produced almost identical clusters as ATAC50 (Fig. 3g). Clustering promoter regions resulted in three clusters ( Supplementary Fig. 3i) that were entirely different and non-overlapping with ATAC50 ( Supplementary Fig. 3j). Collectively, our analyses showed that chromatin accessibility could robustly stratify hGSC cultures and clusters were predominantly dictated by the DRE regions. The high correspondence of the ATAC50 clusters with the MCO stratification implied an important role of cell lineage-controlled gene regulation of human GSC cultures.
ATAC50 clusters are phenotypically distinct. To phenotypically characterize the ATAC50 clusters we first used hGSC gene expression array data 22 and analyzed the 256 GSC meta module genes 7 across the 50 cultures ( Fig. 4a and Supplementary Fig. 4). While only 53 genes showed a significant difference between the clusters (Fig. 4b) there were clear differences comparing global meta module gene expression (Fig. 4a). C1 showed significantly higher expression of NPC1, NPC2, and OPC genes compared to both C2 and C3, significantly higher expression of AC genes compared to C2, and significantly lower expression of MES genes compared to both C2 and C3. Thus, C1 and C2 were always at the end of the spectrum with C3 in the middle. Notably, C3 showed significantly higher expression of NPC1, OPC, and AC genes and significantly lower expression of MES2 genes compared to C2. This suggested that ATAC50 clusters were separated along a gradient of GSC states with C1 being progenitor cell-like, C2 being mesenchymal-like, and C3 being intermediate.
Next, we analyzed chromatin openness of metamodule genes in promoter regions (Fig. 4c) and DRE regions (Fig. 4d). The openness of DRE regions was significantly different between all clusters in all metamodules whereas promoter regions showed less distinct differences for NPC1, NPC2, OPC, and AC metamodules and were nonsignificant for MES1 and MES2. This is in line with the dominant role of DRE regions to separate the ATAC50 clusters (Fig. 3f). Although the differences between ATAC50 groups in most of the comparisons in Fig. 4a, c, d were statistically significant the box plots were still highly overlapping reflecting the extensive intertumor heterogeneity of GBM.
Of the metamodule genes with significant different gene expression (Fig. 4b) the majority (39) were higher expressed in C1 and belonged to the NPC1, NPC2, OPC, and AC modules. C2 showed a higher expression of some MES1 and MES2 genes, and C3 cultures displayed higher expression of a few NPC1, NPC2, and AC genes. To investigate the chromatin openness of DRE regions of these genes we first linked all human ATAC DRE regions with their nearest gene through a peak-to-gene linking prediction analysis 27 [31][32][33][34]39 . When analyzing the relationship of TF motif openness and TF gene expression we found a positive correlation for several TFs (Fig. 4g) which strengthened their involvement in shaping the cluster phenotypes. To further analyze TF regulation we performed TF footprint analysis of differential ATAC peaks, which is a computational method to predict TF binding ( Fig. 4h and Supplementary Fig. 5c). We found that ATAC50 C1 showed significantly higher occupancy of OLIG2, TCF12, ASCL1, and NEUROD1 compared to C2 and C3, while C2 and C3 showed significantly higher occupancy for NF-E2, JUNB, and FOSL2 compared to C1. Among the C3-uniquely enriched TF motifs there were no significant footprints, but among the top-50 variable TF motifs TEAD3 showed significantly higher occupancy for C3 compared to C1 (Fig. 4h). In all, the TF motif enrichment, TF gene expression, and TF occupancy analyses confirmed the phenotypic differences and revealed distinct epigenetic regulation of the ATAC50 clusters.
ATAC50 classification produce functional separation of hGSC cultures. Since the transcriptome and ATAC-seq defined mGSC groups had displayed different functional properties 22 , we asked if the ATAC50 clusters also would. We investigated essential GSC properties in 16 C1, 8 C2, and 13 C3 cultures (Fig. 5), of which the majority had overlapping MCO and ATAC50 classifications (Supplementary Data 1). As a reference we also analyzed all data by grouping the cultures according to TCGA subtype (Supplementary Fig 6 and Fig. 6b). We first performed consecutive sphere-forming assays under clonal conditions ( Fig. 5a and Supplementary Fig. 6a). C1 cultures displayed the highest sphereforming ability while there was no significant difference between C2 and C3 cultures, although C3 cultures produced a higher average number of spheres (Fig. 5a). TCGA grouping also produced significant differences between the MS subtype and CL or PN (Supplementary Fig. 6a). Extreme limiting dilution assay (ELDA) is considered a more reliable and objective measurement of self-renewal and showed that C1 cultures had a significantly higher self-renewal capacity compared to both C2 and C3. However, this method also captured a significant difference between C2 and C3 (Fig. 5b). With this method TCGA groups only captured a difference between MS and the other two subtypes ( Supplementary Fig. 6b). Cell proliferation was measured by BrdU incorporation and here C1 cultures showed significantly higher proliferation compared to both C2 and C3 cultures (Fig. 5c). For this trait TCGA classification could distinguish a significantly lower proliferative capacity of MS cells compared CL and PN ( Supplementary Fig. 6c). Tumor cell invasion was analyzed with the spheroid collagen gel invasion assay. Here we found that C2 cultures were significantly more invasive than both C1 and C3 cultures (Fig. 5d), while TCGA subtypes could separate the MS and CL groups only ( Supplementary Fig. 6d). In all, the functional characteristics were in accordance with the stem and progenitor cell-like molecular phenotype of C1 cultures, the mesenchymal-like phenotype of C2 culture and the mixed molecular phenotype of the C3 cultures. It also showed that ATAC50 was superior to TCGA in predicting two key features of GSCs, self-renewal and invasion. We also analyzed the drug response phenotype of 11 C1, 7 C2, and 10 C3 cultures by measuring cell viability after 72 h exposure to a collection of 28 anticancer drugs at seven different concentrations ( Fig. 5e-g and Supplementary Data 8). This produced dose-response curves that were converted to area under the curve (AUC) measures that were compared pair-wise between clusters. There was a clear overall higher sensitivity of C1 cultures to the compounds compared to both C2 (Fig. 5e) and C3 (Fig. 5f) cultures. All drugs that produced a significantly different response between C1 and C2 or C3 cultures were more effective in C1 cultures. These comparisons identified two compounds as particularly efficient for C1 cultures, Melflufen (alkylating), and PD173074 (FGFR1 inhibitor). When comparing C2 to C3 cultures, C3 cultures were clearly, overall, more sensitive to the tested drugs (Fig. 5g). However, C3 cultures showed a significantly higher resistance to two drugs, 5-azacytidine and 6-thioguanine, compared to both C1 and C2 cultures (Fig. 5f, g). When we compared drug responses between TCGA groups the result was less pronounced (Supplementary Fig. 6e-g). We found that PN cultures had an overall higher sensitivity to these compounds compared to MS cultures ( Supplementary Fig. 6e) while there were small differences between PN and CL ( Supplementary Fig. 6f), and MS and CL (Supplementary Fig. 6g). The distinct drug response phenotypes of the ATAC50 clusters suggested that cell lineage dependencies are important to account for when developing therapeutic strategies for GBM. Fig. 4 ATAC50 clusters are phenotypically distinct. a Comparison between ATAC50 clusters of average gene expression of all genes in each GSC meta module. b Meta module genes with significantly different gene expression between ATAC50 clusters. c Comparison between ATAC50 clusters of average chromatin accessibility at promoters of all genes in each GSC meta module. d Comparison between ATAC50 clusters of average chromatin accessibility at DRE regions of all genes in each GSC meta module. The box plots in the violin plots (a, c, d) show the minima (bottom dot), the maxima (top dot), the median (middle line), and the first and third quartiles (boxes), whereas the whiskers show 1.5× the interquartile range IQR above and below the box. For GSC cell line numbers: ATAC50 C1 (n = 19), ATAC50 C2 (n = 14), and ATAC50 C3 (n = 17). In plots (a, c, d), 256 GSC metamodule genes were used to compared. Two-sided Welch's t-test was performed on all pair-wise comparisons in (a, c, d), significant differences are indicated with p value. e Genome browser tracks of the average ATAC-seq signal in each ATAC50 cluster for ASCL1, CTSB, DPYSL2. f Heatmap of significantly enriched cluster-unique TF motifs. g Scatter plots of TF motif chromatin accessibility (x-axis, normalized chromatin openness) and TF gene expression (y-axis, microarray data, counts per million reads mapped (CPM)). All significantly enriched TF motifs in (f) were analyzed and those with a significant positive correlation are shown. R = Pearson correlation, p calculated by two-sided Welch's t-test. Ninety-five percent confidence intervals are indicated with shaded areas. h TF footprint analysis of cluster-specific ATAC50 peaks. Friedman-Nemenyi test was performed, p < 0.05 for all. The p value for each TF is labeled. Source data are provided as a Source Data file for Fig. 4a-d, f-h.

ATAC-based clusters can predict mouse and patient survival.
Orthotopic tumor growth is a defining capacity of cancer stem cells. We used in vivo survival data, in total 322 intracranially injected immune-deficient mice, from published 22,25,26,28 and unpublished experiments, and included only individuals that had been killed because of disease symptoms before the experimental endpoint (Supplementary Data 9). When grouping mice according to the ATAC50 clusters we found a significant difference in survival between all groups with C1 being most aggressive (Fig. 6a). The same data divided by TCGA subtypes showed significant differences between MS and CL or PN but could not separate all three groups (Fig. 6b).
We also analyzed survival of GBM patients from whom the hGSC cultures had been derived (Fig. 6c-f). When ATAC50 patients were grouped according to TCGA there were no survival differences (Fig. 6c), which is in line with previous patient data for IDH wt GBM. ATAC50 clusters produced more separated curves (Fig. 6d), although there were no significant differences. We also analyzed the MCO clusters because of the high degree of overlap with ATAC50 (Fig. 6e), which produced even more separated curves and a close to significant difference between MCO2 and MCO3 patients (Fig. 6e). Finally, we used all patients and the ATAC60 classification, which for the overlapping 50 hGSC cultures had produced essentially the same clusters as    Fig. 3d). By adding ten more patients the curves between C2 and C3 patients became clearly significant and showed improved survival for C3 patients (Fig. 6f). Because of the clinically relevant ATAC60 patient stratification we reanalyzed the results of Figs. 5 and 6a using ATAC60 groups ( Supplementary Fig. 7). This showed overall the same result as with ATAC50 with small changes in significance levels for proliferation ( Supplementary Fig. 7c, slightly decreased between C1 and C2), invasion ( Supplementary Fig. 7d, slightly increased between C2 and C1 or C3), drug response (Supplementary Fig. 7f-g, slightly decreased between C3 and C1 or C2), and in vivo tumorigenicity ( Supplementary Fig. 7h, slightly increased between C2 and C3) which supports the robustness of the ATAC clustering.
In conclusion, TCGA subtypes had no predictive value while both the MCO-and ATAC-classifications could separate patients of the two most molecularly and functionally similar GBM subgroups. The substantial overlap of these classifications suggests that cell lineage-based classifications could be valuable to predict GBM patient survival, likely because of their ability to distinguish essential tumor cell phenotypes.
Cross-species analyses reveal MCO prediction of hGSC ATAC clusters. By a glance the MCO groups and ATAC50 clusters had shown certain similarities such as a PN-like phenotype of mGC1 GFAP and ATAC50 C1, and a MS-like phenotype of mGC2 NES and ATAC50 C2. To investigate this further we performed a cross-species comparison of enriched TF motifs in unique ATAC peaks (Supplementary Data 3 and 6) of both species (Supplementary Fig. 8a-c). We analyzed the positive correlation of the significantly enriched and variable TF motifs of mGC1 GFAP (Supplementary Fig. 8a), mGC2 NES ( Supplementary  Fig. 8b), and mGC3 CNP (Supplementary Fig. 8c) separately with the complete list of enriched TF motifs from the ATAC50 data.
Finally, we analyzed the relation of the cell lineage-based MCO stratification with ATAC50. We had already observed a considerable overlap of ATAC50 and MCO (Fig. 7a) and wanted to investigate the underlying reason. We started by analyzing if the chromatin landscape of the MCO genes could guide the ATAC50 clusters. Of the 196 MCO genes 22 we used 166 human homologs for which the ATAC-seq peaks of promoter regions were extracted and analyzed by NMF ( Supplementary Fig. 8d, e). This produced a poor overlap with the ATAC50 clusters (Fig. 7b), consistent with the importance of DRE regions (Fig. 3h). Next we used the DRE regions of the MCO genes through the peak-togene linking prediction analysis (Supplementary Data 5). The MCO human homolog genes were annotated to 786 ATAC peaks of DRE regions that were analyzed by NMF ( Supplementary  Fig. 8f, g). This showed a higher concordance with ATAC50 (Fig. 7c) compared to promoter-guided clusters (Fig. 7b) but still lower than the MCO stratification (Fig. 7a). Then we used the cell of origin-specific mouse ATAC peaks (n = 2075, Fig. 1e  present in the peaks-to-genes data and could be linked to 4834 human ATAC peaks. These peaks were used in NMF (Supplementary Fig. 8h, i) which produced clusters that, surprisingly, showed a very high agreement with the ATAC50 clusters ( Fig. 7d). As reference we compared this to 1000 NMF analyses of at each time 4834 randomly selected ATAC peaks from the peaks-to-genes data, which did not reproduce the ATAC50 clusters ( Supplementary Fig. 8j). To investigate the regulation of   the open chromatin in the mouse and human ATAC data used in the mouse-dictated human ATAC50 clusters (Fig. 7d), we compared enriched TF motifs in the 2075 mouse ATAC peaks to those of the corresponding 4834 human ATAC peaks. This showed an 90% overlap of the top-10 significantly enriched TF motifs (Fig. 7e, f). As a reference, we compared this to a 1000 times repeated control experiment where enriched TF motifs in 2075 randomly selected mouse ATAC peaks and their corresponding human ATAC peaks were compared ( Supplementary  Fig. 8k). The TF motif overlap extended from 0 to 7 with an average overlap of 18.3% (standard deviation = ±18.7%). Our cross-species analyses supported a molecular relation of the mGSC groups with the ATAC50 clusters, where the core set of overlapping TF motifs in differential mouse and human ATAC peaks showed a connection between mGC1 GFAP and ATAC50 C1, mGC2 NES and ATAC50 C2, and the intermediate characteristics of mGC3 CNP and ATAC50 C3 (Supplementary Fig. 8a). The predictive capacity of the differential mouse ATAC peaks to produce almost identical hGSC clusters as ATAC50 (Fig. 7d) that displayed common TF regulatory mechanisms with the mGSC groups (Fig. 7e, f) supported the role of neurodevelopmental processes in GBM. In all, this provides strong support for the presence of a conserved cell of origin-determined epigenetic regulation of GBM.

Discussion
Cross-species analyses of transcriptomes have been performed for many cancers, including primary malignant brain tumors such as medulloblastoma 40 , ependymoma 41 , and GBM 22,23 , and have helped to extend our knowledge about how developmental mechanisms contribute to cancer. With increasing knowledge about the central role of epigenetic regulation in GBM and GSCs we set out to perform a cross-species analysis of chromatinaccessibility in mouse and human GSC cultures that we had previously analyzed by cross-species transcriptome analysis 22 . The contribution of developmental regulation in GBM biology and GSC function remains to a large extent to be deciphered and we reasoned that it would be a strength to use the same extensively characterized mouse and human GBM models to be able to compare the previous transcriptome data with the new genomewide chromatin-accessibility data.
ATAC-seq analysis divided both mouse and human cells in three molecularly and functionally distinct groups, and for both species these groups were mainly determined by the chromatin state of DRE regions and regulated by contrasting sets of TFs. The three mGSC cell of origin groups were arranged, as we had already shown for transcriptome data 22 , in a gradient of PN to MS phenotypes with mGC1 GFAP and mGC2 NES at each end and mGC3 CNP in between. The three hGSC ATAC clusters aligned well with recent ATAC-seq analyses of human GBM cells and GSC cultures from independent cohorts 8,16 and supported our non-linear regression analysis which predicted our cohort to be large enough to capture the spectrum of GBM inter-patient heterogeneity. The hGSC ATAC clusters were similarly to the mouse cultures found along the PN to MS axis where expression of GSC state markers 7 and analyses of TF circuits showed that C1 cultures were glial stem and progenitor cell-like, C2 profoundly mesenchymal-like, and C3 intermediate with mostly astrocytic and mesenchymal traits. The cross-species analysis of enriched TF motifs in unique ATAC peaks of mouse and human data strengthened the connection between mGC1 GFAP and ATAC50 C1, mGC2 NES and ATAC50 C2, and mGC3 CNP and ATAC50 C3. As previously shown for mGSC cultures of different origin 22 , also ATAC50 clusters were functionally well-defined, where C1 cultures were most self-renewing, proliferative, tumorigenic, and drug-sensitive, C2 cultures most invasive and drug-resistant, and C3 cultures least invasive and tumorigenic. Importantly, we found that the functional differences between C2 and C3 also were reflected in significantly different patient survival. The precision of chromatin accessibility to separate prognostic patient groups emphasized the importance of epigenetic regulation in GBM.
The considerable overlap of the MCO classification with the ATAC50 clusters implied a cell lineage-controlled regulation of hGSCs. This was corroborated by a second cross-species analysis where we used the unique chromatin-accessible regions of the mGSC data and converted those to chromatin-accessible regions of hGSCs which almost completely could re-establish the ATAC50 clusters. The 90% overlap of top-10 enriched TF motifs in the mouse and human accessible chromatin regions provided strong support for a conserved epigenetic cell lineage regulation of GBM. Our cross-species analyses also validated that our three PDGF-driven tv-a mouse GBM models can produce tumors representative of the breadth of developmental regulation present in our large collection of patient-derived GSC cultures. The fact that one oncogenic driver (PDGFRA activation) could reproduce the epigenetic heterogeneity of human GBM was in line with results from the comprehensive single cell RNA-seq analysis of human GBM 7 where multiple cellular states were shown to be present in all investigated tumors, while state distributions were proposed to be dictated by certain genetic factors such as PDGFRA. Taken together, this would argue for that GBM epigenetic heterogeneity is mainly the consequence cell of origininherited developmental regulation which in turn provide the basis for possible GSC states, where GBM driver mutations determine the state transition dynamics.
We show the power of a chromatin accessibility-based functional classification of GSCs. Continued work to identify the key regulatory elements in the DREs dictating the different properties and common features of the epigenetic clusters, and to validate key TF circuits regulating GSC states by perturbation strategies will be crucial to pinpoint therapeutic targets. Our analysis of chromatin accessibility in mGSCs and hGSCs has revealed a species conservation of the GBM epigenome and demonstrated the importance of cell lineage diversity for accurate in vivo modeling of GBM inter-patient heterogeneity.
Human GBM cell cultures. All 60 human GSC cultures used in this study have been established in our laboratory and are part of the HGCC biobank (hgcc.se). Handling of human tissues and data were performed in accordance with the protocol approved by Uppsala ethical review board (2007/353) and following informed written consent from all patients. All have been previously described and most have been authenticated 22,25,26 . Cultures were maintained on poly-ornithine/ laminin-coated dishes in DMEM/F12 Glutamax (Gibco) and Neurobasal medium (Gibco) mixed 1:1 with addition of 1% B27 (Invitrogen), 0.5% N2 (Invitrogen), 1% penicillin/streptomycin (Sigma), 10 ng/ml each of EGF and FGF2 (Peprotech). Human cells used in all experiments were below passage 20. All human cell cultures are listed in Supplementary Data 1.
All cell cultures, both mouse and human, have been regularly analyzed for mycoplasma infection using either a PCR-based method with the primers Myco1 (50-GGCGAATGGGTGAGTAACACG) and Myco2 (50-CGGATAACGCTTGC GACTATG) (Invitrogen), or the KAPA kit (Techtum, cat# 25-KK7352), and have at all times tested negative.
Human GSCs and mouse NSCs were fixed with 1% formaldehyde (Thermo Fisher Scientific, 28906) for 10 min and quenched with 0.125 M glycine for 5 min at room temperature. After the fixation, ATAC-seq was performed as previous described 44 . Cells were counted and 50,000 cells were used per ATAC-seq reaction. The transposition reaction followed the normal ATAC-seq protocol. After transposition, a reverse crosslink solution (final concentration 50 mM Tris-Cl PH 8.0 (Invitrogen, 15568-025), 1 mM EDTA (Invitrogen, AM9290G), 1% SDS (Invitrogen, 15553-035), 0.2 M NaCl (Invitrogen, AM9759) and 5 ng/μl proteinase K (Thermo Scientific, EO0491)) was added up to 200 µl. The mixture was incubated at 65°C with 1200 rpm shaking in a heat block overnight, then purified with MinElute PCR Purification kit (Qiagen, 28004) and eluted in 10 µl Qiagen EB elution buffer. Sequencing libraries were prepared following the original ATAC-seq protocol 43 . All sequencing was performed on Illumina NovaSeq 6000, and at least 20 million paired-end sequencing reads were generated for each ATAC-seq library.
ATAC-seq data processing and quality analyses. All ATAC-seq data were processed with same pipeline described below. After the Adapter sequence trimming, the ATAC-seq sequencing reads were mapped to genome hg19 (for human GSCs) or mm9 (for mouse GSCs and mouse NSCs) using bowtie2 (ref. 45 ). Mapped paired reads were corrected for the Tn5 cleavage position with shifting +4/−5 bp depending on the strand of reads. All mapped reads were extended to 50 bp centered by Tn5 offset. The PCR duplication were removed using Picard (http:// broadinstitute.github.io/picard/) and sequencing reads from chromosome M were removed. The Peak calling of each ATAC-seq library was performed with MASC2 46 with parameters -f BED, -g hs, -q 0.01, -nomodel, -shift 0. Peaks were merged into matrix with bedtools merge 47 . Raw reads within peaks were normalized using EdgeR's cpm 48 . Log transformation were applied on these normalized peaks to calculate the pearson correlation among duplicates. Unique ATAC peaks for hGSC ATAC clusters were selected using DESeq2 (ref. 49 ), with cutoff p value <0.01, FDR < 0.01, log2 fold change > 1, peak average intensity > 30, and coefficient of variance < 0.2. Mouse differential ATAC-seq peaks were identified by comparing each mGSC group with each other using the parameters log2(fold change) > 1, false discovery rate < 0.05.
The ATAC-seq saturation analysis from human GSC was performed by randomly selecting samples and successively calculating the number of peaks identified within the number of samples. The self-starting non-linear mode has been used to predicate the saturation point. Non-linear model: where P(x) represents predicted numbers of peaks, x corresponds the actual number of peaks, a, b, c, and d represent the parameters for self-starting simulation.
For ATAC-seq peak visualization, Washu Epigenome Browser was used to visualize these presentative peak regions from mouse NSC, mGSC and hGSC.
To analyze the chromatin accessibility signal per gene, the accessibility of the regions (no further than the window −1000 to +100 bp from a transcriptional start site) were defined as promoter regions and the elements (located more than 3 kbp from a TSS and no further than 500 kbp) were represented as DREs. The genomic annotation of ATAC-seq was performed with seven genomic features: 3′ UTR, 5′ UTR, exon, intergenic region, intron, TSS, and TTS using ChIPseeker 50 .
For ATAC-seq peak visualization, Washu Epigenome Browser was used to visualize these presentative peak regions from mouse NSC, mGSC, and hGSC.
Annotation of unique ATAC peaks to genes and GO analysis. Genomic annotation of each ATAC peak to its nearest gene for mouse (Supplementary Data 2) and human (Supplementary Data 5) was done using ChIPseeker 50 . STRING (https://string-db.org/) was used to perform GO enrichment analysis and the result was ranked by strength.
NMF cluster analysis of human GSC ATAC-seq data. The NMF method 51 was used to cluster Human GSC ATAC-seq with nsNMF 52 . In brief, ATAC-seq peaks were ranked according to their variance from high to low. The cophenetic correlation score was calculated with cluster number 2 to 8, and used to determine the number of clusters following the standard method 51 . Top 70000 (20%) ATAC-seq peaks from hGSC were used to build NMF clusters.
Cluster-specific TF motif enrichment analysis of unique ATAC-seq peaks. Specific ATAC-seq peaks for different mGSC groups and different hGSC clusters were first calculated with DESeq2 using default parameters, and the specific differential peaks for each group/cluster were analyzed for TF motif enrichment. In brief, the Homer vertebrate TF database was used as input of TF motifs in chromVAR, then TF accessibility deviation values for each sample were calculated across the whole sample set. TF deviations with a threshold larger than 1 were kept, and TF motifs with positive correlation with one group/cluster was selected to represent that group/cluster. TFs were ranked based on their variabilities for each group/cluster, and z-scores of deviations from each TF were visualized in a heatmap.
Correlation analysis of enriched TF motifs and corresponding TF gene expression. To more accurately predict TF activity, TF deviation and gene expression were combined in a Pearson correlation analysis with the threshold p value <0.05.
Identification of significantly enriched and variable TF motifs of unique ATAC-seq peaks. To identify all significantly enriched and variable TF motifs of the mGSC group-unique and hGSC cluster-unique ATAC peaks the deviation score of TF motifs was positively correlated to each cluster with the cut-off parameters of variability >1.5 and q value <0.05. TFs representing each mGSC group are listed in Supplementary Data 3 and TFs representing each ATAC50 cluster are listed in Supplementary Data 6.
Linkage of hGSC ATAC-seq data to genes and gene expression correlation analysis. To analyze the correlation between human ATAC-seq data and gene expression array data we performed a correlation-based approach. First, ATAC-seq peaks were annotated to their nearest genes ("peak-to-gene linkage"; Supplementary Data 5) within ±0.5 Mbp but ±3 kbp of TSS. For each pair, the Pearson correlation between the ATAC-seq peak accessibility and the gene expression level was calculated. Next, the mean and standard deviation for these correlations were calculated to represent nonspecific correlation. Then, multiple correction was performed using Benjamini-Hochberg procedure to adjust these p values. At last, only pairs with false discovery rate (FDR) < 0.05 were kept.
TFs footprint analysis for cluster-specific hGSC ATAC-seq peaks. In previous descriptions, Tn5 transposase inserted two adaptors separated by 9 bp 54 . Sequencing reads aligned files in sam format by offsetting +4/−5 bp for all the reads depending on the strand of reads. A shifted base sam file converted to bam format and was sorted by samtools 55 . ATAC-seq reads for each ATAC50 cluster of samples (C1, C2, and C3) were concatenated and 200 million reads were randomly selected from each cluster and merged into bam files. Then TF footprint analysis was performed on cluster-specific regions using the HINT-ATAC software. Input motifs were obtained from the Homer 56 database of vertebrates. Four hundred and fourteen motifs were tested and filtered with p value <0.05. The normalized read counts were centered by the motif sites around 200 bp genomic region for visualizing motif footprints.
Consecutive sphere formation assays. Adherent human GSC cultures were dissociated with TrypLE (Thermo Fisher, 12563011) into single-cell suspensions. For primary sphere formation, 1000 cells/well were seeded in eight replicates in a 24-well low attachment plates. After 7 days, the number of primary spheres formed for each culture were counted. The primary spheres were dissociated and 1000 cells/well were again seeded in eight replicates for secondary sphere formation that were counted after 7 days. The same procedure was repeated for the tertiary spheres.
Proliferation analysis. Human GSC cultures (5 × 10 3 cells/well) were seeded (Day 0) on laminin-coated coverslips in a 24-well plate using serum-free medium. The next day (Day 1) 1 μg/μl of BrdU (Sigma, B5002) was added to each well for 16 h before they were fixed with 4% formaldehyde (Histolab, 02176) were washed with PBS and permeabilized in 2 M HCl for 20 min, followed by the washing again with PBS. Cells on the coverslips were permeabilised in 0.2% Triton X-100 with 3% bovine serum albumin (Sigma, A7906) for 5 min and washed thrice with PBS. Cells were blocked in 0.2% Triton X-100 solution containing 1% BSA and 5% normal goat serum (Dako, X0907) for 1 h. Primary antibody again BrdU (1:100; Abcam, Ab6326) was applied overnight in a humidified chamber at 4°C. Cells were washed three times with 0.2% Triton X-100 solution for about 5 min each time. Secondary antibody incubation was performed for 30-60 min in room temperature with anti-rat Alexa 555 (1:400; Invitrogen, A21434). Lastly cells were washed three times with 0.2% Triton X-100 solution for about 5 min each time and mounted in Fluoromount (Sigma, F4680) with 0.1% DAPI in it. The stainings were visualized and quantified under a LEICA DMi8 Fluorescent microscope. The experiment was repeated three times on consecutive passages for all the cultures.
Extreme limiting dilution assay. Human GSC cultures were dissociated into single-cell suspensions in serum-free medium. Cells were seeded in a 96-well low attachments plates (CLS3474-24EA; Sigma) with the seeding density ranging from 100 cells to 1 cell per well, with 10 replicates per condition. After 7-10 days, the number of wells without spheres for each cell density were counted. The number of cells required to form at least one sphere per well was calculated by extrapolating the values of x-intercept for each culture and plotted using PRISM 7 software. The experiment was repeated three times for all cultures.
Invasion assay. Cell spheres obtained from hGSC cultures by seeding 50 or 100 cells per well in the ELDA experiment were used to measure the invasive capacity for each culture. Collagen gel matrix was prepared and spheres were transferred into the collagen gel matrix sandwich in a 24-well plate as described previously 26 . Pictures were taken after 10 min and 24 h in an Eclipse TS 100 Nikon microscope. Image J 1.52a software was used to measure the invasion area for each sphere. From each hGSC culture at least 10 spheres were analyzed. The experiment was repeated three times on consecutive passages for all cultures.
Drug response analysis in HGCCs. A panel of 28 anticancer compounds (Supplementary Data 8) were used to measure the drug sensitivity of hGSC cultures. Cells were seeded, 1000 cells/well in a poly-ornithine (Sigma, P3655) and laminin (Sigma, L2020) coated 384-well plate (Thermo Fisher Scientific, 164688) with six replicates for each culture and dose. The following day cells were exposed to the compounds and drug response was measured after 72 h with the non-clonogenic fluorometric microculture cytotoxicity assay. The experiment was repeated twice. Dose response curves were plotted and the average area under the curve values for each compound and culture were calculated. For each ATAC50 cluster (C1, C2, C3) the average of AUC values for each compound was calculated and compared, pairwise, between ATAC50 clusters. The log10 fold change between clusters were calculated using Wilcoxon test and scatter plots produced as described previously 21 .
In vivo xenograft analysis. All animal experiments were performed in accordance with the rules and regulations of Uppsala University and approved by Uppsala animal ethics committee (C237/12 and C182/14). Intracranial cell transplantations of human GSC cultures were performed in neonatal NOD.CB17-Prkdcscid/NCrHsd mice (Harlan) of both sexes as previously described 22,25,26,28 . Of the 321 mice included in the study 60 were previously unpublished (Supplementary Data 9). In brief cells were dissociated in TrypLE and resuspended in DMEM/F12 medium. A volume of 2 μl cell suspension with cells ranging from 10,000 till 200,000 was orthotopically injected using a motorized stereotaxic injector (Stoelting CO). The coordinates measured from lambda were anterio-posterior 1.5 mm, medio-lateral 0.7 mm, and dorso-ventral 1.5 mm. Mice were monitored every second day and euthanized through exposure to carbon dioxide upon symptoms of illness according to the ethical permit humane endpoint, determined by Uppsala University guidelines on the recognition of pain, distress, and discomfort in experimental animals. Such symptoms could for example include immobility, piloerection, hunched posture and weight loss of 10%. Only mice that showed disease symptoms before the endpoint of the experiment were used in the survival analysis.
Quantification and statistical analysis. Statistical analysis was performed using GraphPad PRISM 7 software or R version 3.4.0. Figures containing data from multiple repetitions of experiments were presented as mean ± SEM. For sphereformation, ELDA, proliferation, and invasion experiments Student's t-tests were performed. For mice and patient survival, log-rank (Mantel-Cox) test was the statistical method used to calculate the significance in between the groups/clusters.
Cross-species analysis of TF motifs in cluster-specific mouse and human ATAC peaks. The deviation scores of significantly enriched and variable TF motifs of mouse (Supplementary Data 3) and human (Supplementary Data 6) GSC cultures were compared. If the average deviation was larger than zero the TF was defined as positively correlated.
Cross-species analyses of MCO genes and unique mouse ATAC peaks. MCO genes were converted to human homolog genes and ATAC-seq data of promoter regions or DRE regions of these genes were used to build NMF clusters. For the mGSC ATAC-guided clusters we used the annotated mouse genes from the unique mouse ATAC peaks (Supplementary Data 2) and converted these to human homologs. The human homolog genes were converted to human ATAC peaks through the peaks-to-genes analysis (Supplementary Data 5). NMF analysis was performed on these hGSC ATAC peaks. To analyze the major regulatory sequences in the (input) unique mGSC ATAC peaks and the (output) hGSC ATAC peaks we performed TF motif enrichment analysis using HOMER of both data sets and compared the top-10 ranked enriched TF motifs.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
All raw data ATAC-seq data from human GSC and mouse GSC generated in this study have been deposited in Gene Expression Omnibus of National Center for Biotechnology information under access code GSE163853. The publicly available gene expression data of human GSC used in this study is from previous published report 22 and available in the Gene Expression Omnibus of National Center for Biotechnology information under access code GSE91393. The in vivo survival data of intracranially injected immunedeficient mice from previous published reports 22,25,26,28 are available in Supplementary data 9. The remaining data are available within the Article, Supplementary Information or Source Data file. Source data is provided with this paper.