The Prognostic 97 Chemoresponse Gene Signature in Ovarian Cancer

Patient diagnosis and care would be significantly improved by understanding the mechanisms underlying platinum and taxane resistance in ovarian cancer. Here, we aim to establish a gene signature that can identify molecular pathways/transcription factors involved in ovarian cancer progression, poor clinical outcome, and chemotherapy resistance. To validate the robustness of the gene signature, a meta-analysis approach was applied to 1,020 patients from 7 datasets. A 97-gene signature was identified as an independent predictor of patient survival in association with other clinicopathological factors in univariate [hazard ratio (HR): 3.0, 95% Confidence Interval (CI) 1.66–5.44, p = 2.7E-4] and multivariate [HR: 2.88, 95% CI 1.57–5.2, p = 0.001] analyses. Subset analyses demonstrated that the signature could predict patients who would attain complete or partial remission or no-response to first-line chemotherapy. Pathway analyses revealed that the signature was regulated by HIF1α and TP53 and included nine HIF1α-regulated genes, which were highly expressed in non-responders and partial remission patients than in complete remission patients. We present the 97-gene signature as an accurate prognostic predictor of overall survival and chemoresponse. Our signature also provides information on potential candidate target genes for future treatment efforts in ovarian cancer.


Results
Seven microarray datasets (GSE49997, GSE14764, GSE26712, GSE63885, GSE19829, GSE30161, The Cancer Genome Atlas -TCGA) and one TCGA RNA-seq dataset were used in this study. Data selection was based on availability of clinicopathological information such as OS time and event, grade, stage, chemoresponse, and related chip types. Detailed information for these datasets is described in the Materials and Methods (Supplementary Table S1).
The end point of this study was the identification of a gene signature prognostic of overall survival after first-line chemotherapy in both low-and high-risk groups. GSE49997 was used as the training dataset. A supervised method of risk prediction, with a stringent threshold cut off (p < 0.01 and 2-fold difference) was applied to the training dataset. After filtering for probe set intensity, 5,218 genes were analyzed using univariate Cox proportional hazard regression (p < 0.01) and 137 genes associated with OS were identified. However, only 97 genes were measured in the training dataset and all validation datasets (Fig. 1a). The 97-gene signature defined a threshold that split the training dataset into low-and high-risk groups with 91 and 93 patients, respectively. To evaluate the prognosis, Kaplan-Meier survival curves were plotted and the log-rank test showed a significant difference in OS (p = 1.2E-4; Fig. 1b) and progression-free survival (PFS) (p = 0.026; Fig. 1c). The expression pattern of the 97-gene signature divided low-and high-risk patients into two clusters (Fig. 1d).  Validation of the 97-gene signature in independent datasets. To further assess outcome prediction, we combined all of the validation datasets (total of 836 patients) to evaluate the robustness of the 97-gene signature. We further validated the prognostic significance of the gene signature in an external TCGA RNA-seq dataset. A flow chart of the procedure used to validate external datasets is depicted in Fig. 2a. Leave-one-out cross-validation (LOOCV) specificity and sensitivity used was 0.972 and 0.932 in Compound Covariate Prediction (CCP), respectively. The gene signature significantly stratified patients in all combined validations into low-and high-risk groups (p = 1.18E-08; Fig. 2b). Most of the independent datasets had a small number of patients. Therefore, to assess the generalizability of our gene signature, validation datasets were merged according to chip type as follows: GSE14764 and GSE26712 as GPL96; GSE63885, GSE19829 and GSE30161 as GPL570; and TCGA GPL96. The merged validation datasets' Kaplan-Meier plots predicted significant differences in prognosis: GPL96 (p = 8.1E-5; Fig. 2c), GPL570 (p = 0.00478; Fig. 2d), TCGA GPL96 (p = 0.0319; Fig. 2e) and TCGA RNA-seq as an independent validation (p = 0.044; Fig. 2f). Kaplan-Meier graphs for each independent validation dataset demonstrated a decrease in the overall survival of the high-risk group versus the low-risk group, but some datasets were statistically insignificant because of the small number of patients ( Supplementary Fig. S3).

Prognostic subclassification of patients in relation to current clinical covariates.
To investigate the prognostic significance of the 97-gene signature in association with stage and grade in all validation datasets, univariate and multivariate Cox regression analyses were performed. In both univariate and multivariate analyses, the 97-gene signature proved to be a stronger predictor of OS than grade and stage (HR 1.778, 95% CI 1.4-2.2, p = 1.2 E-6; and HR 1.749, 95% Cl 1.385-2.209, p = 2.6E-6 respectively; Table 2).
To assess whether the 97-gene signature could stratify patients in the low grade into low-and high-risk groups, patients in grades 1 and 2 were merged because only 5patients were in grade 1. The 97-gene signature significantly stratified patients into low-and high-risk groups (p = 0.00171; Fig. 3a). The gene signature also separated patients in grade 3 into low-and high-risk groups (p = 1.11e-06; Fig. 3b).
Further, to evaluate whether the 97-gene signature could classify patients in each stage into low-and high-risk groups, all patients from the validation datasets were merged according to stage: I (n = 20), II (n = 27), III (n = 548) and IV (n = 222). The signature clearly stratified patients in stage III (p = 0.000151; Fig. 3c) and stage IV (p = 2.02E-07; Fig. 3d).
Prognostic value of the 97-gene signature in association with chemoresponse. All patients in our study received first-line platinum and taxane combined chemotherapy. Patients were considered to be in complete remission (CR) if all detectable tumors disappeared after first-line chemotherapy, partial remission (PR) if the tumor significantly reduced in volume, and non-responder (NR) or stable disease (SD) if the tumor continued to grow or new tumors appeared. To evaluate whether our gene signature could accurately predict response to first-line chemotherapy, subset analyses were performed on TCGA (n = 326) and GSE30161 (n = 58) datasets, for which chemoresponse information was available. The signature clearly classified low-risk patients into CR and PR groups (p = 9.1E-18, Fig. 4a) and high-risk patients into CR and PR groups (p = 2.04E-11; Fig. 4b). The signature also dichotomized low-and high-risk patients into responders and non-responders (p = 1.12E-05 and p = 0.0309, respectively; Fig. 4c,d). To consolidate these results regarding chemosensitivity, we sorted both CR and PR case groups into low-and high-risk. Our gene signature showed that the low-risk group had better prognosis both in PR (Fig. 4e) and CR (Fig. 4f).
Gene Ontology term enrichment analysis and visualization of the 97-gene signature. We performed Gene Ontology enrichment analysis in DAVID to identify biological functions of the genes in the 97-gene  signature and identified 20 significant terms (biological processes), including metastasis, migration, growth factor binding, angiogenesis, regulation of apoptosis, and regulation of epithelial cell proliferation. False discovery rates were estimated using the procedure of Benjamin (p < 0.05, Supplementary    cell proliferation, angiogenesis, and metastasis (Fig. 5), and HIF1α regulated important genes involved in tumor cell growth, angiogenesis, and blood vessel morphology (Fig. 4). Furthermore, we showed that genes from the 97-gene signature that were involved in angiogenesis and metastasis were the most upregulated ( Supplementary  Fig. S3). We also demonstrated interactions using the STRING database to elaborate physical and functional association amongst the genes in the signature (Supplementary Fig. S4).
Significant association of the HIF1α-regulated genes with platinum and taxane. To understand the molecular mechanisms behind the three chemoresponse groups CR, PR, and NR, individual box plots for each of the nine genes were conducted (Fig. 6). HIF1α-regulated genes were found to be highly expressed in NR and PR but less expressed in CR cases.

Discussion
Several reports have attempted to show that gene signatures are capable of predicting individual clinical outcomes in ovarian cancer [16][17][18][19][20][21][22][23] . Prognostic gene signatures based on chemoresistance, molecular subtyping, and debulking tumors in ovarian cancer have been reported by analysis of homogenous datasets [24][25][26][27] . However, none of these are in clinical use yet because of difficulties caused by ovarian cancer heterogeneity. Hence, precise prognostic prediction for chemoresponse and mortality after first-line chemotherapy remains a critical issue in ovarian cancer treatment. Here, we developed a novel risk classification system based on a 97-gene signature from ovarian cancer patients 27 . Patients from various study groups were merged in order to get insight into distinctive endpoints of ovarian cancers and to overcome challenges reported by previous studies, such as homogeneity or small patient size. A supervised method was used to construct the gene signature that was then refined by LOOCV. Furthermore, a meta-analysis approach based on six Affymetrix microarray datasets (n = 836) and one RNA-seq dataset (n = 419) was applied to validate the prognostic significance of the gene signature in association with overall survival. The reproducibility of the gene signature was improved using only Affymetrix validation datasets. Univariate and multivariate analyses with adjustment for clinical parameters showed a significant association of our prognostic gene signature with survival rate. The 97-gene signature was also able to predict chemoresponse in both low-and high-risk patients. Moreover, our signature revealed nine HIF1α-regulated genes to be predictive of response to first-line platinum and taxane chemotherapy in ovarian cancer.
In clinical oncology, grade and stage are the current diagnostic criteria. However, the heterogeneity observed in therapeutic response within the same grade or stage indicates that histological classification is not an adequate predictor of survival 9 . In our study, univariate and multivariate Cox proportional hazards regression analyses of clinical variables based on OS in both the training and validation datasets also indicted that the 97-gene signature was a more reliable prognostic indicator than current clinical variables. Taken together, we think that the integration of the 97-gene signature with grade and stage could be a better approach for determining patient prognosis in clinical practice. The precise prediction of chemoresponse would greatly improve ovarian cancer patient prognosis by guiding chemotherapy choices. However, it is difficult to determine the response to first-line chemotherapy because of the complexity of ovarian cancer genetic heterogeneity 20 . Several investigators previously reported gene signatures associated with chemoresponse in ovarian cancer 28,29 . However, most of these studies were performed on small numbers of patients and were only prognostic, not predictive, of first-line chemotherapy response. The 97-gene signature could predict patients who would attain complete or partial remission after first-line chemotherapy in both low-and high-risk groups. In addition, the gene signature was also predictive of non-responders (NR) to first-line chemotherapy in both low-and high-risk groups. This predictability of our gene signature could benefit PR and NR patients by directing the choice of effective alternate and/or additional therapeutic options early in the disease.
Some of the genes in the 97-gene signature have been reported to play critical roles in biological processes, including angiogenesis, metastasis, and proliferation in ovarian cancer. These genes include: CLIP, COL3A1, COL10A1, LAMB1, INHBA, NBL, F3, PDGFRA, PDGFRB, PLAU, ROR2, SCG2, and ZFHX4 30 ; COL11A1, COL5A2, THBS2, TIMP3 and SINAI2 31 ; EGR2 32, 33 and FOS 34 . Moreover, we investigated transcription factors included in the 97-gene signature. Consistent with previous studies, p53 35,36 and HIF1α 22 were the upregulated transcription factors. However, a previous study reported that a large phase III trial of adenoviral wild-type p53 delivery combined with taxol and carboplatinum did not show any positive results in ovarian cancer 37 . To date, there is no accepted explanation of why this trial failed 38 . It is well known that the p53 signaling pathway has many links to ovarian cancer and is triggered by various stress signals 39 .
We investigated our novel nine HIF1α-regulated genes (ACTA2, EPOR, CTGF, FOS, SCG2, IGF1, CYR61, NT5E and NR4A). Interestingly, these genes had higher expression in NR and PR but lower expression in CR. Several studies reported ACTA2 upregulation in high-grade ovarian cancer 40,41 . Western blot analysis by McBroom JW demonstrated increased EPOR expression in multiple ovarian cancer cell lines 42 . CTGF was reported to be a target therapeutic gene in high-grade ovarian cancer 43 . FOS has also been reported as a molecular predictor of progression and survival in ovarian cancer 44 . SCG2 was mentioned in a systematic review of genes reported to play a role in chemotherapy resistance in ovarian cancer 30 . IGF1 is well known to be involved in the development and progression of several types of tumors, including ovarian cancer 45 . Overexpression of CYR61 has been shown to be associated with development and poor prognosis of ovarian carcinomas 46 . NT5E-expressing cancers were reported to be resistant to platinum-based therapy 47 , while NR42A has been reported to be highly expressed in ovarian cancer tumors with poor prognosis 48 . We found that these HIF1α-regulated genes play significant roles in ovarian cancer first-line chemotherapy resistance, leading to disease progression and poor clinical outcome. These results suggest the possibility of developing and testing anti-HIF1α target based therapies for non-responders and partial remission patients.
In conclusion, we propose that this 97-prognostic gene signature could predict patient outcome in ovarian cancer. This gene signature can stratify subgroups of ovarian cancers into low-and high-risk groups in a reliable and reproducible manner across combined and independent patient cohorts. Our data suggest that this classifier may have considerable clinical relevance, especially in predicting first-line chemoresponse. This study will enable additional efforts to move ovarian cancer signatures closer to clinical relevance, and not only offer insight into mechanisms of chemoresistance but also provide information on potential target genes for future therapies.

Materials and Methods
Our evaluation was performed in four major steps: selection of both eligible genomic datasets and eligible prognostic models, transparent reimplementation of risk prediction models identified by literature review, selection of the gene signature from the training dataset, and multi-study validation of the selected gene signature using meta-analytic methods.
Eligibility Criteria and Implementation of the Prognostic Model. All datasets that had prognostic value in epithelial ovarian cancer were considered. Dataset GSE49997 from Austria was selected as the training dataset. Although it had 204 patients, only 184 patients were used. Twenty patients were excluded because they lacked OS information (Supplementary Table S1). Probe set identifiers were translated into gene symbols 27 . Eligibility criteria for datasets and samples. Data in the public domain that provided expression information about primary patient tumors, OS time and event, and related chip types were assessed. The analysis was repeated to exclude samples that did not fit the above mentioned criteria.
Information sources. Prognostic models were identified through PubMed searches. The following terms were used to search PubMed: ("ovarian cancer" OR "Ovary tumor" OR "ovary carcinoma" OR "ovarian tumor" OR "ovarian carcinoma" OR "microarray" OR "expression array" OR gene expression AND ovarian cancer gene signature).
All datasets used were downloaded from National Center for Biotechnology Information Gene Expression Omnibus database (http://www.ncbi.nlm.nih.gov/geo) and Cancer Genome Atlas database (http://cancergenome. nih.gov/). The robust multi-array averaging (RMA) method was used to average raw files. Seven different datasets were used with a total of 1,020 patients (Supplementary Table S1).
SCIenTIfIC REPORTS | 7: 9689 | DOI:10.1038/s41598-017-08766-5 Development of the prognostic gene expression signature. To develop a prognostic gene expression signature, dataset GSE49997 was used as the training dataset. Overall survival and gene expression data were combined to build a gene expression profiling based survival classifier 52 . The 5,218 genes measured were filtered by at least 2 absolute values of log 2 scale, which represented the same gene expression level. Univariate cox proportional hazard regression (P < 0.01) was then used to identify an OS associated gene expression signature from the discovery data set. For prognosis prediction, genes from the survival signature were applied in the survival risk prediction analysis. The prognostic index was computed by the formula Σ i w i x i − 0.467581 where w i and x i are the weight and logged gene expression for the i-th gene. A sample was predicted as high (low) risk if their prognostic index was larger than (smaller than or equal to) the median prognostic index of −0.038061. Initially, 137 genes associated with survival were discovered in the training dataset, however, only 97 genes were measured by both U133 plus 2.0 and U133A, and so these genes were used in all validations as the 97 gene signature (Supplementary Table S3).
Validation of the prognostic signature. Six independent datasets were used to validate the gene signature. Gene expression from different datasets was adjusted individually by subtracting the median expression value from all samples before merging. CCP was used as the class prediction algorithm to further refine this model and sub-stratify the predicted outcomes. Robustness was estimated by the misclassification rate that was determined in the training dataset during LOOCV 21 .
Kaplan-Meier survival analyses were performed after classification of patients into two risk groups, and Chi-square (χ 2 ) and log-rank tests were used to evaluate the survival risk between the two predicted subgroups of patients. Univariate and multivariate Cox proportional hazard regression analyses were performed to evaluate independent prognostic factors associated with survival. The gene signature, age, tumor grade, and FIGO stage were employed as covariates.
Pathway analysis. The Database for Annotation, Visualization, and Integrated Discovery (DAVID) bioinformatics tool was used for Gene Ontology (GO) biological process enrichment analysis, P values of less than 0.05 was used to define significant pathways. Ingenuity Pathways Analysis (IPA) was also used for transcription factor and gene network analyses (http://www.ingenuity.com) 53 . Identified gene networks were ranked to score (Z-score = 02).

STRING analysis.
To predict protein-protein interactions, Search Tool for Retrieval of Interacting Genes/ Proteins (STRING) database V10.0 was used (http://www.string-db.org/). Proteins were linked based on the following six criteria: neighborhood, gene fusion, co-occurrence, co-expression, experimental evidence, and existing databases 54 .
Statistical methods for microarray data. Microarray data and heatmaps were analyzed using BRB-Array Tools Version 3.0 (http://linus.nci.nih.gov/BRB-ArrayTools.html) 55 . All other statistical analyses were accomplished in the R language environment (www.r-project.org) and Statistical Package for Social Sciences (SPSS) software (version 20, SPSS Inc., Chicago, IL, USA). In all statistical analyses, P values of less than 0.05 was considered significant.