In silico validation of RNA-Seq results can identify gene fusions with oncogenic potential in glioblastoma

RNA-Sequencing (RNA-Seq) can identify gene fusions in tumors, but not all these fusions have functional consequences. Using multiple data bases, we have performed an in silico analysis of fusions detected by RNA-Seq in tumor samples from 139 newly diagnosed glioblastoma patients to identify in-frame fusions with predictable oncogenic potential. Among 61 samples with fusions, there were 103 different fusions, involving 167 different genes, including 20 known oncogenes or tumor suppressor genes (TSGs), 16 associated with cancer but not oncogenes or TSGs, and 32 not associated with cancer but previously shown to be involved in fusions in gliomas. After selecting in-frame fusions able to produce a protein product and running Oncofuse, we identified 30 fusions with predictable oncogenic potential and classified them into four non-overlapping categories: six previously described in cancer; six involving an oncogene or TSG; four predicted by Oncofuse to have oncogenic potential; and 14 other in-frame fusions. Only 24 patients harbored one or more of these 30 fusions, and only two fusions were present in more than one patient: FGFR3::TACC3 and EGFR::SEPTIN14. This in silico study provides a good starting point for the identification of gene fusions with functional consequences in the pathogenesis or treatment of glioblastoma.


Results
Tumor tissue samples were obtained from 329 of the 432 glioblastoma patients registered in the GLIOCAT project 16 . After multiple RNA extractions from each sample, 357 RNA libraries were prepared and RNA-Seq results were obtained for 151 patient tumor samples. Fusions were assessed with STAR fusion 6 in 139 formalinfixed, paraffin-embedded (FFPE) samples obtained at first surgery. Four of these samples had paired fresh-frozen (FF) samples obtained at first surgery and four had FFPE samples obtained at a second surgery performed at the time of recurrence. Data on molecular subtypes 17 according to the Gene Set Enrichment Analysis (ssGSEA) and Instrinsic Glioma Subtypes (IGS) algorithms were available for 124 samples obtained at first surgery using the GLIOVIS and the R clusterRepro packages 18,19 . Gene fusions by initial RNA-Seq in glioblastoma samples. Among the 139 patients with FFPE samples obtained at first surgery, RNA-Seq detected one or more fusions in 61 (43.9%). Table 1 displays the patient characteristics according to the presence or absence of fusions. Fusions were more prevalent in the classical TCGA and the IGS-18 subtypes. Tumors with MGMT methylation had more fusions than those without methylation. Of the 68 tumors with MGMT methylation, 36 (59%) had fusions, while of the 63 tumors without MGMT methylation, only 20 (32.8%) had fusions (P = 0.01).
Using FusionHub 20 , we classified fusions according to the type of genes they included. The 103 fusions involved 167 different genes (Supplementary Tables S2A and S2B and Supplementary Dataset 1) that can be classified as follows: 1) known oncogenes or tumor suppressor genes (TSGs) (n = 20, 11.9%); 2) genes associated with cancer but not oncogenes or TSGs (n = 16, 9.6%); 3) genes not associated with cancer but involved in fusions in gliomas (n = 32, 19.2%); 4) not associated with cancer and not involved in fusions in gliomas (n = 99, 59.3%) (Supplementary Tables S3A-D Selection of fusions with oncogenic potential. As shown in Fig. 2, we first eliminated six fusions (detected in 45 samples) because they had previously been detected in healthy tissue (Table 2), which would indicate no relevant role in cancer. Of the remaining 97 fusions, ten (detected in 14 samples) had previously been detected in cancers, including gliomas (Table 2) and 21 (detected in 33 samples) had not previously been identified in cancer but included an oncogene or TSG (Table 3). The remaining 66 fusions have not been described in healthy tissue or in cancer and did not include an oncogene or TSG.
After eliminating the frame-shifted fusions, verifying the breakpoints with the Integrative Genomics Viewer 21 and running Oncofuse 22 , we classified the remaining 30 fusions in the four previously established categories: (1) six were previously described in cancer; (2) six were not previously described in cancer but involved an oncogene or TSG; (3) four were predicted by Oncofuse to have oncogenic potential; and (4)  www.nature.com/scientificreports/ that can produce a protein but that have not previously been described in cancer, do not involve an oncogene or TSG, and were not predicted to have oncogenic potential by Oncofuse (Supplementary Dataset 2). These 30 different fusions were considered to have oncogenic potential (Table 4).
Clinical characteristics of patients with gene fusions with oncogenic potential. Twenty-four patient samples harbored one or more of the 30 fusions categorized as having oncogenic potential (Table 4). Two tumor samples had two fusions, two had three fusions, and one had four fusions. When we compared the clinical characteristics of the patients whose tumors had one or more of these fusions, there was no correlation www.nature.com/scientificreports/ with patient age or MGMT methylation status. All patients except one were IDH wild-type. Two patients were secondary glioblastomas with a history of previous low-grade glioma that had been treated with surgery alone. Three patients were classified as TCGA mesenchymal subtype, one of whom was classified as IGS-23 subtype, while the remaining patients were TCGA classical or proneural. Two fusions had previously been associated with glioblastoma: FGFR3::TACC3 and EGFR::SEPTIN14. Three patients had the FGFR3::TACC3 fusion, all of whom were men older than 63 years and one of whom had MGMT methylation. Two patients had the EGFR::SEPTIN14 fusion, both of whom were women with MGMT methylation  Procedures and data bases used in the present study to select the fusions with oncogenic potential. From the long list of fusions detected by RNA-Seq, we used STAR-Fusion to detect fusion genes and FusionInspector to validate predicted fusions. We then used FusionHub to eliminate fusions previously described in healthy tissue, identify fusions previously described in cancers, and explore whether either gene had been identified as an oncogene or tumor suppressor gene (TSG) or had been associated with cancer. We next used FusionValidate to select only in-frame fusions and finally ran Oncofuse to predict the oncogenic potential of each fusion.  www.nature.com/scientificreports/ (Table 4). The remaining fusions with oncogenic potential were each found in only one patient; this low frequency precluded a validation by RT-PCR of these fusions. There were no differences in overall survival between patients with no fusions, those with fusions with oncogenic potential, and those with non-oncogenic fusions (p = 0.59).

Comparison of fusions in FFPE vs FF tumor tissue. Four patients had paired FFPE and FF tis-
sue from the first surgery and four others had paired FFPE tissue from both the first and second surgery. More fusions were detected in FF than in FFPE tissue, but fusions with oncogenic potential were detected in both types of samples. EGFR::SEPTIN14 was detected in both FFPE and FF samples from one patient; CLIC4::SRRM1, ZMPSTE24::CACNA1D and ADD2::C2orf42 were detected in samples from another patient; and TBK1::TMPRSS12 was detected in samples from a third patient. FGFR3::TACC3 was detected in the FFPE sample from the first surgery but not in the sample from the second surgery.

Discussion
In order to explore the role of gene fusions in glioblastoma, we have analyzed fusions by RNA-Seq in tumor samples from 139 newly diagnosed, uniformly treated glioblastoma patients. Since our RNA-Seq results provided a long list of gene fusions, we performed an in silico study to predict their oncogenic potential. We first eliminated the fusions previously described in healthy tissue and then selected those previously described in cancer, those involving oncogenes or TSGs, and those identified by Oncofuse 22 as having oncogenic potential. We limited our Table 4. Characteristics of patients with tumors harboring one or more of 30 gene fusions with oncogenic potential. IHC immunohistochemistry, Pro proneural, Cla classical, Mes mesenchymal, NA not available. a In-frame fusion that can produce a protein but that has not previously been described in cancer, does not involve an oncogene or tumor suppressor gene, and was not predicted to have oncogenic potential by Oncofuse. b Fusion predicted by Oncofuse to have oncogenic potential. c Fusion not previously described in cancer but involving an oncogene or tumor suppressor gene. d Fusion previously described in cancer. www.nature.com/scientificreports/ selection to in-frame fusions that could produce a protein with a biological effect. We identified a final list of 30 gene fusions with oncogenic potential, which were present in glioblastoma samples from 24 of the 139 patients included in the study. We then examined the frequency of these fusions in our series of glioblastoma patients and their potential correlation with patient characteristics and outcome. RNA-Seq is useful in the assessment of tumors as a method to detect druggable fusions 23 . However, it provides a multitude of data that do not necessarily have biological significance. Moreover, several of the fusions have not been properly validated individually in the tissue in question, probably because of the large amount of data obtained and the difficulty of identifying the fusions that are biologically meaningful.
Methods for the detection of gene fusions are constantly evolving and it is certain that new methods will become available in the future. We used several methods in our analyses. For example, we used STAR fusion 6 but later ARRIBA 24 became available. Nevertheless, a recent study has shown that both of these methods outperformed others and were equally accurate at detecting fusions 25 . In addition, we ran DEEPrior 26 in parallel with Oncofuse 22 but chose Oncofuse as the final method since we found that Oncofuse results were more reliable. We also considered using PEGASUS 27 but at the time of our study, it used the old version of the human genome (hg19), which dates from 2014. Finally, another method, ChimerDriver, has just been reported this year 28 . This diversity of currently available and newly emerging platforms means that it will be necessary to carefully determine the best method to use in the future to detect gene fusions and to identify those with oncologic potential. Two fusions identified as having oncogenic potential in our study had previously been associated with glioblastoma: FGFR3::TACC3 and EGFR::SEPTIN14. The FGFR-TACC fusion has been reported in 1.2-8.3% of glioblastomas 29,30 . The latest WHO classification of gliomas describes fusions that occur in IDHwild-type glioblastoma at an estimated frequency of > 1% 10 . EGFR, one of the most frequent genes involved in recurrent in-frame fusions, is commonly found fused to SEPTIN14 or to PSPH, with a frequency of 4% and 2.2%, respectively, in glioblastoma 31 . In our series, EGFR was involved in fusions in 5% of patients but not all the fusions involving EGFR had oncogenic potential. In fact, in-frame fusions involving EGFR with oncogenic potential were only detected in four patients (2.8%): EGFR::SEPTIN14 in two samples, SEC61G::EGFR in one, and EGFR::R3HDM2 in one. Of these, only the EGFR::SEPTIN14 fusion was a bona fide driver, as the SEC61G::EGFR and EGFR::R3HDM2 fusion proteins would lack the EGFR tyrosine kinase domain. The remaining EGFR fusions detected would produce either a frameshift transcript or no transcript at all.
Other EGFR alterations are also frequent in glioblastoma, including EGFR amplification, the EGFRvIII mutation, and altered splicing and rearrangements 32,33 . In our study, EGFR amplification was detected by FISH in all cases with EGFR fusions except one (a case with the EGFR::SEPTIN14 fusion where there was insufficient available tissue for FISH analysis) (data not shown). The co-occurrence of EGFR fusions with EGFR amplification and EGFR vIII (exon 2-7 deletion) has also been previously reported 34 . This "two-hit" alteration has been described for several oncogenes in different tumor types, and it has been suggested that these oncogenes would be dosage-sensitive, with amplification of a mutated copy further increasing tumor fitness 35 . This could be the case in our specimen with the co-occurrence of the EGFR::SEPTIN14 fusion and EGFR amplification, but it would not explain the existence of putatively non-functional EGFR fusions in cases with EGFR amplification. However, previous studies in glioblastoma have reported an increase in DNA breaks near genes targeted by copy number gains, including EGFR 36 . Taking this into account, we can speculate that the non-functional EGFR fusions could be the by-product of localized genome instability and would thus have no significance in the biology of the tumor. Along these same lines, in our study, we have detected several non-productive gene fusions in the 12q region, another breakpoint-enriched region in glioblastoma.
Unfortunately, therapies targeting different alterations of EGFR have failed to confer survival benefit [37][38][39][40][41] , although these studies did not include EGFR fusions. Fusions involving the NTRK genes have also been reported in glioblastoma, but they are more common in pediatric populations 42  In the present study, we have identified 30 fusions with oncogenic potential. Each of these fusions was detected in < 1% of cases. Therefore, although our intention was to validate by RT-PCR the fusions identified in our in silico study, their low frequency made it unreasonable to do so in our series of patients. Such a low frequency of potentially oncogenic gene fusions suggests that the detection of individual fusions by RT-PCR would be neither reasonable nor cost-effective and that RNA-Seq would thus be the best procedure for searching for targetable fusions. Moreover, we found no correlation with patient characteristics that could identify a patient as a potential holder of any specific fusion.
Nonetheless, although many of the fusions identified in our study have not yet been described in glioblastoma, several of them involve actionable gene alterations that have been successfully targeted in other cancers. Considering the rarity of specific gene fusions in glioblastoma, it is not feasible to conduct a clinical trial limited to this subset of patients. However, the implementation of NGS in the molecular characterization of tumors is helping to identify a constantly increasing number of molecular alterations that are present in small subsets of a plethora of tumor types. We therefore believe that the NGS analysis of glioblastoma may allow the inclusion of glioblastoma patients in exploratory basket trials of specific tumor-agnostic biomarkers, as has been done with other rare gene alterations 48 . Our in silico study to detect in-frame fusions with oncogenic potential thus provides a good starting point for the identification of fusions that may be relevant to the pathogenesis or treatment of glioblastoma.

Methods
Patients and study design. From 2004 to 2014, the GLIOCAT project 49,50 collected clinical data from 432 consecutive glioblastoma patients from six institutions, all of whom had received the standard first-line treatment (surgery followed by radiotherapy with concurrent and adjuvant temozolomide). The pathological diagnosis was confirmed by pathologists according to WHO 2007 classification guidelines 51 before patients were included in the project. Once selected for inclusion, each case was anonymized and given a number to identify it across all data.
The following data were recorded: age, sex, symptoms, tumor characteristics, radiological characteristics, type of surgery, post-surgical performance, Mini Mental Status Examination (MMSE) score, details of radiotherapy and temozolomide treatments and treatment at relapse, date of progression, subsequent treatments, date and status at last control, and date and status of death or last control alive. Once patients were included in the study, MGMT methylation status was determined if it had not previously been assessed.
This study was approved by the Institutional Review Board of the Hospital Germans Trias i Pujol (PI-14-016) and by the Ethics Committees of all the participating institutions and their biobanks and was conducted in accordance with the ethical standards as laid down in the 1964 Declaration of Helsinki and its later amendments. All patients or their representatives gave their informed consent.
Tissue microarray (TMA) construction and immunohistochemical analyses (IHC). TMAs were constructed using a Veridiam Tissue Array Instrument (El Cajon, Ca, USA), model VTA-100, using a 1-mm diameter needle. Consecutive 4-µmm-thick sections were obtained and hematoxylin-eosin staining was done in sections 1, 20, and 40 in order to evaluate the persistence of the tumor at each spot.
DNA extraction and assessment of MGMT methylation. DNA was extracted from two 15-µm sections of FFPEtissue using the QIAamp DNA Mini Kit (QIAGEN GmbH, Hilden, Germany), following the manufacturer's protocol. In cases with less than 50% of tumor cells, the tumor tissue was macrodissected manually. Then 500 ng of extracted DNA was subjected to bisulfite treatment using the EZ DNA Methylation-Gold Kit (Zymo Research Corporation, Irvine, CA). MGMT promoter methylation status was determined by methylation-specific PCR (MSP) as previously described 52 .
RNA-Seq assessments. RNA extraction from FFPE and FFsamples was performed on five 15 µm-deep tissue sections using the RNeasy FFPE Kit (Qiagen, Hilden, Germany) according to the manufacturer's recommendations. RNA quantity and purity were measured with the NanoDrop ND-1000 spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA) and the Qubit RNA HS Assay Kit (Invitrogen, Eugene, OR, USA). The highest-quality RNA samples were sent to the Centro Nacional de Análisis Genómico (CNAG-CRG, Barcelona, Spain) for analysis by RNA-Seq. Methods for assessing quantification, purity and quality of samples have been previously described 16,49 .
The libraries were sequenced on HiSeq2000 (Illumina) in paired-end mode with a read length of 2 × 76 bp using TruSeq SBS Kit v4. Each sample was sequenced in a fraction of a sequencing v4 flow cell lane, following the manufacturer's protocol. Image analysis, base calling and quality scoring of the run were processed using Real Time Analysis (RTA 1.18.66.3) software and followed by the generation of FASTQ sequence files by CASAVA.
Classification of glioblastoma molecular subtypes. The TCGA classification of glioblastoma molecular subtypes 17,33,53 was performed with the GlioVis portal 18 . The GlioVis glioblastoma TCGA cohort according to the ssGSEA method was selected as training dataset for both glioblastoma molecular subtype and Glioma CpG island methylator phenotype (G-CIMP) predictions. The IGS classification of glioblastoma molecular subtypes was done with the R clusterRepro package 17,19 , using the centroids for IGS0, IGS9, IGS16, IGS17, IGS18, IGS22 and IGS23 subtypes, as described by Gravendeel 19 and used in several European series 54,55 . Figure 2 shows the procedures and data bases used in the present study to select the fusions with oncogenic potential. STAR-Fusion (https:// github. com/ STAR-Fusion/ STAR-Fusion/ tree/ STAR-Fusion-v1.9.0) 6 was used to detect fusion genes based on discordant read alignments. Predicted fusions were further validated with FusionInspector in "validate" mode, which re-aligns the reads to a reference containing the genome and the fusion-gene contigs identified in the former step. Candidate fusions were annotated according to prior knowledge of fusion transcripts relevant to cancer biology (or previously observed in normal samples and thus less likely to have oncogenic potential) and assessed for the impact of the predicted fusion event on coding regions, indicating whether the fusion was in-frame or frame-shifted, along with combinations of domains expected to exist in the putative chimeric protein.

Identification of candidate gene fusions.
Selection of fusions with oncogenic potential. We then used FusionHub 20 (https:// fusio nhub. demop ersis tent. com/), which provides information from 28 public fusion and gene databases, and other data bases in the literature 56 (Fig. 2). We first eliminated fusions previously described in healthy tissue. We then identified fusions previously described in cancers, including gliomas, and looked at whether any of the genes in the fusions was known to be fused with other genes in cancers. Finally, we explored whether either gene had been identified as an oncogene or TSG or had been associated with cancer. www.nature.com/scientificreports/ Next, we selected only in-frame fusions, which could produce a protein with biological effect, and manually verified the break-points with Integrative Genomics Viewer (IGV, version 2.9.4) using the reference sequence hg38 21 . We reviewed the exons of each gene involved in the fusion as well as the amino acids with respect to the reference sequence.
To predict the oncogenic potential of each fusion, we ran DEEPrior 26 and Oncofuse 22 . We found that DEEPrior did not predict an oncodriver role for the known oncogenic fusion FGFR3-TACC3, which led us to choose Oncofuse (www. unav. es/ genet ica/ oncof use. html) for our analysis. Oncofuse provides information on the Bayesian probability of a fusion being a driver (or class 1), with a higher value indicating a higher probability, or a passenger (or class 0) by giving a Bonferroni-corrected P-value that does not take into account whether the fusion is in-frame when calculating the P-value. We set the probability of a fusion being a driver at P > 0.75 and the P-value for it being a passenger at P < 0.05.
These procedures provided us with a final list of fusions with probable oncogenic potential in glioblastoma and allowed us to classify them into four categories: (1) fusions that had previously been described in cancer; (2) fusions that had not been described in cancer but that involve genes previously described as oncogenes or TSGs; (3) fusions that did not meet the above conditions but had a high Oncofuse probability of having oncogenic potential; and (4) fusions that did not meet any of these conditions but produce a protein product. We then looked at the incidence of the selected fusions in our sample set. We compared the results obtained in FFPE and paired FF tissue from the same patient and compared the results found in samples obtained at initial surgery and those obtained at relapse.

Data availability
The datasets generated during the current study are available from the corresponding author on reasonable request. Molecular data underlying the findings described in the manuscript are fully available without restriction from the Bioproject Sequence Read Archive: https:// www. ncbi. nlm. nih. gov/ sra/ PRJNA 833243 and http:// www. ncbi. nlm. nih. gov/ biopr oject/ 613395.