Transcriptome and protein interaction profiling in cancer cells with mutations in histone H3.3

Mutations of histone variant H3.3 are highly recurrent in childhood glioblastoma and in young adults with Giant Cell Tumor of the Bone (GCTB). The heterozygotic representation of the mutations in the tumors, and with potential histone H3 and H3.3 redundancy, suggest that the mutations are gain-of-function by nature. To address common H3.3 point mutations, we have generated data from GCTB patient samples with H3.3 G34W substitutions and engineered human GFP-tagged H3.3-mutated isogenic cell lines for high throughput data comparisons. First, a total of thirty-six patient samples and cell lines were used to acquire gene expression transcriptome data using microarray and RNA-sequencing. The expression data were validated with the orthogonal nCounter assay. Second, to uncover the H3.3-GFP interaction proteomes from the isogenic cell lines, immunoprecipitation of unmutated wild type, K27M, G34R, and G34W substitutions were performed. The RNA-sequencing data and the H3.3 interaction proteome enable potentially important functional insight into the tumorigenic process and should spur further detailed analysis.


Background & Summary
Recurrent genomic alterations are a cornerstone in cancer development and reproducible mutational profiles have been observed at defined genomic locations. Whole genome sequencing of tumors from adults and seniors have uncovered genomic landscapes with hundreds of small and large-scale mutations, revealing the complexity of chromosomal rearrangements influencing tumorigenesis 1 . In contrast, tumors from children and young adults develop from a less complex cytogenetic background, simplifying the process of uncovering the driving forces of tumorigenesis and their transcriptional profiles. This is exemplified by the recent discovery of recurrent mutations of histone H3 and the replicationindependent variant H3.3 in pediatric Glioblastoma 2 and Giant Cell Tumor of the Bone (GCTB) 3 where very few other genic mutations were discovered after whole-exome sequencings. Yet, the histone mutations affected global epigenetic modifications of important residues, dramatically changing gene expression of targeted genes, seemingly driven by the histone mutations that lead to amino acid substitutions at or very near frequent post-translational modified amino acids; namely the K27M, G34R/ V/L/W, and K36M substitutions 3 .
The power behind these histone mutations can be traced to their role as pre-eminent binding sites for proteins influencing transcription. The first, and perhaps the best example, is the H3.3 K27M substitution leading to binding and catalytic inhibition of the polycomb repressive complex 2 (PRC2), with global loss of H3K27 methylation across the genome as a result 4,5 . A similar example is the H3.3 K36M substitution binding the enzymes MMSET and SETD2, leading to global reduction of H3K36 methylation 6 . These two examples indicate that mutations of the N-terminal tail of histones can acquire novel binding properties and greatly influence transcription. We address the possibility that the cancer-related histone H3.3 substitutions in this study bind yet unknown proteins, allowing them to be purified and characterized by gel separation and mass spectrometry.
As presented here, the transcriptome and interactome from GCTB biopsies and isogenic cell lines, respectively, with mutations of H3.3 have been made available (Fig. 1). We first isolated the stromal compartment from GCTB biopsies of the tumor and establishment primary cell lines for further studies (Fig. 1a). The enriched cells were used to generate gene expression microarray data and RNA sequencing. We then generated isogenic cell lines in HEK293 cells by targeting the endogenous H3F3A locus encoding H3.3 with mutations of H3.3 fused to GFP (Fig. 1b). From total protein extracts and immunoprecipitations, we purified the proteins interacting with the H3.3 mutated constructs. This constituted the comprehensive H3.3 interactome. Together we collected data from analysis platforms with the ambition to understand the function of H3.3 in cancer (Fig. 1c), which in part has been described in a previous publication 7 .
We there reported on the comparison of unmutated and H3.3 G34W tumors and found a distinct gene expression pattern likely dictated by the single mutation in H3.3. We uncovered several known genes associated with GCTB, e.g. that RANKL is affected via the downregulation of its decoy receptor OPG, and that the entire IGFBP-family of genes appear to be downregulated 7 . These genes are all targeted by the E2F transcription factor family.
We also reasoned that uncovering the interaction proteome of H3.3 in its normal and mutated form would allow us to gain further insight into its function in cancer. As previously mentioned, isogenic stable cell lines with either of the four versions of H3.3 (WT, K27M, G34R, and G34W) were established in HEK293 cells where e.g. H3.3 WT have been denoted isoH3.3 WT to indicate an engineered isogenic version of H3.3 to avoid confusion with patient samples. The H3.3 interaction proteome uncovered 493 proteins of which about half (225) were commonly bound by all constructs (Fig. 2a), and around 100 proteins uniquely interacted with each tested mutant of H3.3 when using WT as a reference binder (Figure 2b). While some proteins substantially loose interaction with H3.3, some also gain in binding capabilities (blue line in Figure 2c).
Our analysis in the previous study mainly focused on the H3.3 G34W substitution in GCTB, filtered against other H3.3 interaction proteomes to enrich for the once that specifically represented G34W (Figure 2a and g). This strategy successfully identified and verified the interaction of splicing-related factors (most prominently hnRNPA1L2), but all proteins identified as binding to the various mutants of H3.3 were not characterized. In this Data Descriptor we present the complete interaction proteome of H3.3 with WT versus K27M, G34R, and G34W substitutions (Table 1 and Figure 2). For a brief overview, identified proteins have been listed based on protein-protein interaction scores (Figure 2d-g). While H3.3 is a protein known to exert its function in the nucleus, the H3.3 interactomes also contain proteins from other cellular compartments ( Figure 2h). We hope that the presented data will be used in conjunction with other aspects of H3.3 function in development and disease.

Methods
Please note that samples and the generation of data presented in this Data Descriptor have been previously presented in Lim et al. 7 . While the description of the methods here is very similar to that report, we specifically want to emphasize the samples and the methods used to generate the data, in particular the previously unavailable H3.3 interaction proteome from cancer-related recurrent substitutions.

Sample collection
The data in this data descriptor contain information generated from Giant Cell Tumor of the Bone biopsies and established control cell lines. Twenty tumor biopsies were collected from two cohorts, one from Germany and one from South Korea, to establish primary cell lines from the two clinics. Informed consent to analyse tumour tissue and to publish clinical details was obtained from all individuals included in the study. The use of patient samples and the experiments performed in this study was approved by and in accordance with guidelines and regulations by the Ethics Committees of the University of Heidelberg, University of Leipzig, University Medical Center Hamburg-Eppendorf, and the National Cancer Center of Korea (IRB NCC2015-0070).  Unique to only isoH3.3 G34R    ·/· Lindroth_02 ·/· · /· · /· · /· ·/· Lindroth_03 ·/· · /· · /· · /· isoH3.3 K27M HEK293 Lindroth_04 Total protein extract GFP Immuno-precipitation Q-Exactive LC-MS/MS PRIDE PXD009966 Total protein extract GFP Immuno-precipiation Q-Exactive LC-MS/MS PRIDE PXD009966 Total protein extract GFP Immuno-precipiation Q-Exactive LC-MS/MS PRIDE PXD009966  protease inhibitors (Roche), Pepstatin A (Roche) and Aprotenin (Roche). Cleared lysates were diluted 1:2.5 in dilution buffer (10 mM Tris-HCl (pH7.5), 150 mM NaCl, 0.5 mM EDTA). Total protein concentrations were determined by BCA protein assays (Pierce, Thermo Scientific) and PAGE-gel followed by Coomassie (BioRad) staining to determine equal loading prior to immunoprecipitation. The equal loading-adjusted protein lysates were incubated with equilibrated 25 μl GFP-Trap-A bead slurry (Chromotek), washed and recovered with 2x Laemmli buffer (BioRad) supplemented with 10% ß-mercaptoethanol. A new PAGE-gel was run to separate the eluates, and stained with Silver stain kit (Pierce, Thermo Scientific) for verification. Each lane was divided into three equal portions and subjected to in-gel tryptic digestion and LC-MS/MS separation with the Q Exactive Hybrid Quadrupole-Orbitrap Mass spectrometer instrumentation (Thermo Fisher Scientific).
1. For the RNA-sequencing data, we trimmed the reads using trimmomatic0.36 8 , mapped the reads with TopHat 9 to the human genome reference hg19, and assembled the reads with CuffLinks 10 . 2. Protein scores from the LC-MS/MS analysis were generated and analyzed with default settings using the SEQUEST package (Thermo Fisher Scientific).

Data Records
The gene expression microarray data, generated by the Illumina HT12 array platform using total RNA from Giant Cell Tumor of the Bone biopsies, have been made available by the Gene Expression Omnibus (GEO) in Data Citation 1.
The RNA sequencing data, generated by poly-dT selected RNA from primary cell lines isolated as outlined in figure 1, have been made available by the Gene Expression Omnibus (GEO) in Data Citation 2.
The liquid chromatography and mass spectrometry (LC-MS/MS) data, generated from GFP immunoprecipitation experiments of isogenic HEK293 cell lines containing H3.3-GFP constructs, have been made available at the PRIDE Archive (www.ebi.ac.uk/pride/archive/) in line with the ProteomeXchange (PX) consortium guidelines. The public dataset (including results as mzIdentML files, peak files and raw data) is available under Data Citation 3.
Data indicating the cellular compartments of individual proteins were provided by the Human Protein Atlas (https://www.proteinatlas.org/about/download).

Technical Validation
In a previous report utilizing the data presented here, we validated our findings with orthogonal methods 7 . First, the gene expression microarray data from GCTB biopsies (Data Citation 1) were validated with deep sequencing technology of poly-dT selected RNA from primary cell lines (Data Citation 2) originating from an independent set of samples, not overlapping with samples used to generate the microarray data (Table 1). Hierarchical clustering and GO-term analysis indicated very similar transcriptional profiles and gene functional characteristics, suggesting that the two methods were congruent 7 .
Second, we performed validation of the gene expression data with the RNA hybridization-based nCounter assay (NanoString Technologies). We found a strong correlation between the two methods ( Figure 3; correlation coefficient 0.5 for H3.3 WT and 0.48 for H3.3 G34W ).
To uncover potential proteins binding the cancer-related mutated N-terminal tail of H3.3, we first generated and validated proper endogenous gene targeting of the isogenic cell lines by PCR and Southern blot analysis 7 . After confirming the GFP-expression of selected clones, we performed immunoprecipitations (IP) of the H3.3-GFP fusion proteins to uncover the H3.3 interactome, with emphasis on proteins that specifically interacted with the mutated forms. Mock IP with parental control did not pull down any proteins after the washing procedures. From the H3.3 interactome, we found hnRNPA1L2 to be the strongest interactor to H3.3 G34W and validated the mass spectrometry data with PAGE and Western blot analysis 7 . Here, we make all data available to allow further analysis of expression data and the WT and mutant H3.3 interaction proteomes (Table 1).