A seven-gene prognostic model for platinum-treated ovarian carcinomas

Background: Prognosis of ovarian carcinoma is poor, heterogeneous, and not accurately predicted by histoclinical features. We analysed gene expression profiles of ovarian carcinomas to identify a multigene expression model associated with survival after platinum-based therapy. Methods: Data from 401 ovarian carcinoma samples were analysed. The learning set included 35 cases profiled using whole-genome DNA chips. The validation set included 366 cases from five independent public data sets. Results: Whole-genome unsupervised analysis could not distinguish poor from good prognosis samples. By supervised analysis, we built a seven-gene optimal prognostic model (OPM) out of 94 genes identified as associated with progression-free survival. Using the OPM, we could classify patients in two groups with different overall survival (OS) not only in the learning set, but also in the validation set. Five-year OS was 57 and 27% for the predicted ‘Favourable’ and ‘Unfavourable’ classes, respectively. In multivariate analysis, the OPM outperformed the individual current prognostic factors, both in the learning and the validation sets, and added independent prognostic information. Conclusion: We defined a seven-gene model associated with outcome in 401 ovarian carcinomas. Prospective studies are warranted to confirm its prognostic value, and explore its potential ability for better tailoring systemic therapies in advanced-stage tumours.

Ovarian carcinoma is the first death cause from gynaecological malignancy in western countries. Its poor prognosis is linked to late diagnosis, usually done at advanced stage, and to the development of chemoresistance. The classical therapeutic sequence combines maximal debulking surgery followed by adjuvant platinum and paclitaxel-based chemotherapy (Bristow et al, 2002;International Collaborative Ovarian Neoplasm Group, 2002;Ozols et al, 2003;Cannistra, 2004). Unfortunately, 20% of patients are refractory to chemotherapy, and 450% of those who achieve initial complete remission relapse and succumb from disease progression (McGuire et al, 1999). Overall survival (OS) is thus short (5-year OS: 30 -40% for all stages) and has remained stable for two decades, notably in advanced stages (McGuire et al, 1999).
Ovarian carcinoma is clinically heterogeneous. Patients with morphologically similar, advanced-stage tumours display a broad range of clinical outcomes. Prognostic features, including patient's age, performance status, FIGO stage, histological tumour grade and subtype, and initial surgery results, are insufficient to capture the important individual variations in response to chemotherapy and survival. For example, it is impossible to predict which patients will benefit or not benefit from systemic first-line platinum/taxane-based chemotherapy. The consequence is that all women are given the same regimen although they will not display the same response and outcome.
This heterogeneous outcome suggests the existence of biologically different forms. Potential prognostic or predictive biomarkers, such as TP53, MYC, ABC transporters, BCL2, or BRCA genes (Williams et al, 2005;Kommoss et al, 2007;Walsh et al, 2008;Gadducci et al, 2009), have been identified. However, none has been validated for routine use. Large-scale RNA expression profiling has been used to find genes associated with response to chemotherapy (Spentzos et al, 2005;Bild et al, 2006;Lage and Denkert, 2007) or prognosis (Spentzos et al, 2004;Hartmann et al, 2005;Bonome et al, 2008) in ovarian carcinomas. However, these studies were relatively limited to small or median size populations, notably regarding the validation set.
Our objective was to identify, from primary ovarian cancer biopsies, a molecular predictor, associated with increased survival following platinum-based chemotherapy, and to validate its performances in a large panel of independent tumours.

Samples selection
Pre-treatment ovarian cancer samples from 35 patients who underwent initial surgery followed by platinum-based chemotherapy were available for RNA profiling. They were collected at the Institut Paoli-Calmettes (IPC) between January 1994 and June 2007. Each patient gave written informed consent for molecular analysis. This study was approved by our institutional ethic committee. After removal, samples were macrodissected by pathologists and frozen within 30 min of removal. All profiled specimens were reviewed by a pathologist (JJ) before RNA extraction and contained 460% of tumour cells. After surgery, patients were treated using platinum-based chemotherapy according to standard guidelines.

Gene expression profiling
Total RNA isolation was done with the All prep DNA/RNA kit (Qiagen, Valencia, CA, USA). RNA integrity was assessed by 2100 Agilent bioanalyser (Agilent, Palo Alto, CA, USA).
Gene expression analysis was done with Affymetrix Human Exon 1.0 ST arrays (Affymetrix, Santa Clara, CA, USA), as recommended by the manufacturer (http://www.Affymetrix.com).
We limited our expression analysis to gene level using only known and identified transcripts (Core library, Affymetrix). Analyses are described in Supplementary materials and methods available online. They include unsupervised and supervised approaches. Supervised analysis (see study flowchart, Supplementary Figure 1) aimed at identifying in the IPC set a multigene expression predictor for progression-free survival (PFS). First, Cox regression analysis identified genes whose expression (continuous variable) was associated with PFS (Pp0.01, Wald's test). A median expression profile of progressive samples was computed from these differential genes. A correlation score (Pearson's coefficient) of each sample with this profile was computed and used to classify samples. Two groups of samples were thus defined: an 'Unfavourable group' defined by a positive score and a 'Favourable group' defined by a negative score. Second, we defined, from these differential genes, an optimal prognostic model (OPM). Recursive iterations were performed with a multivariate Cox model. Variables selection was done with an iterative method including two steps with leave-one-out cross-validation. The 'Forward' step identified the most significant variable to classify the tumours. If its significance rate was 41% and the resulting classification was better than the one from the previous model, the variable was kept. The 'Backward' step took out variables one after the other one from this new model in a reverse way and evaluated all possible combinations to choose the most valuable one. This step was repeated until the model could not be improved. Once the best model was defined (OPM), a prediction score, defined by a Cox resulting linear function, was then calculated for each sample, thus defining two classes: the 'Unfavourable' class with a positive score and the 'Favourable' class with a negative score.
To validate the predictive performances of the model in independent ovarian carcinoma samples, we analysed five publicly available data sets (Berchuck et al, 2005(Berchuck et al, , 2009Partheen et al, 2006;Tothill et al, 2008;Denkert et al, 2009). We first identified the common genes. Then, we median centred the corresponding gene expression values within each data set (Berchuck's sets were pooled and doubloons were excluded). The prediction score (OPM) defined two classes: 'Unfavourable' (positive score) and 'Favourable' (negative score). Regarding the prognostic analysis, the clinical outcome available in these studies was OS. The value of time to death was available in three studies, but not in the two other studies where information was 'Long survivors' if OS was superior to 7 years and 'Short survivors' if inferior to 3 years in one study (Berchuck et al, 2005), and OS lower or higher than 5 years in the other one (Partheen et al, 2006).

Statistical analysis
All statistical analyses were done in R version 2.6.1 (http://cran. r-project.org) and its associated packages. Details about clinical definitions and statistical analyses are given in Supplementary materials and methods.

Gene expression profiling of ovarian carcinoma
We profiled 35 ovarian cancer samples from patients who underwent oophorectomy at the IPC. All cases were adenocarcinomas treated with platinum-based chemotherapy after surgery. Their characteristics are summarised in Table 1, and detailed in Unsupervised analysis based on hierarchical clustering distinguished two groups of samples without any significant correlation with histoclinical data (Figure 1), and specifically clinical outcome. Kaplan -Meier analyses showed P-values of 0.88 and 0.09 for PFS and OS, respectively. Consistent with previous studies (Schaner et al, 2003), we identified coherent gene clusters involving in a specific biological function or chromosomal location. Seven of them are shown in Figure 1. Two of them included genes involved in stromal environment and cellular movement. The first one (cluster A) contained genes coding for proteins of the extracellular matrix (COL1A1, COL1A2, FN1, VIM, and MMP2) or involved in cellular mobility (MYH11, MYL9, and MYLK). The second one (cluster F) included genes coding for proteins involved in cellular adhesion, such as the claudins (CLDN3, CLDN4, and CLDN7) or CDH1. It is of note that expressions of these clusters were anticorrelated, suggesting that they may represent the two opposite sides of a same mechanism. Two other clusters were associated with immune response: cluster C was associated with the complement pathway (C1QA, C1QB, C2, C3, and CFB), the class II major histocompatibility complex (DQA2, DRA, and DMA) and the Natural Killer lymphocytes pathway (FCER1G, LAIR1, and TYROBP); cluster D contained genes linked to the Interferon pathway (IFI6, IFI27, and IRF9). We observed also a cluster linked to the 17q12 chromosomal region (cluster E), including ERBB2 and neighbour genes (TMEM99, PERLD1, C17orf157). Cluster B contained genes involved in early cell response (EGR, JUNB, and FOS). Finally, several genes involved in cell-cycle control pathways such as DNA damage repair (BARD1, FANCF, E2F3, and E2F5), checkpoint control (CHEK1, CCNB1, CCNB2, TOP2A, and TOP2B), and apoptosis (BAK1, CASP2, CASP8, CDC2, and MAPK8) were in cluster G.

Identification of a multigene predictor of survival
We then searched for a multigene expression predictor for PFS. Progression-free survival was correlated with mRNA expression of   Table 2). Figure 2 shows the classification of the 35 cases according to this gene expression signature. The two groups identified showed PFS difference (P ¼ 4.44E-06, logrank test). We then sought to define, among those 94 genes, an OPM with fewer genes potentially more easily applicable in clinical practice. Multivariate Cox analysis retained a seven-gene model (Supplementary Table 3), including two genes (A1BG and PAH) associated with unfavourable outcome and five genes (SLC7A2, ALCAM, TMPRSS3, TSPAN6, and C14orf101) associated with favourable outcome. Using a linear predictor, we defined two classes of tumours as 'Favourable' (n ¼ 21; 60%) and 'Unfavourable' (n ¼ 14; 40%). As expected, they displayed different clinical outcomes (Figure 3) with respective 2-year PFS equal to 46 and 0% (P ¼ 6.06E-07, log-rank test) and respective 2-year OS equal to 90 and 46% (P ¼ 3.29E-03, log-rank test).
Next, we analysed correlations between these two classes identified by our OPM and histoclinical features of samples: age, histological subtype, grade, stage, taxane use, surgical status, and pathological and clinical responses. As shown in Table 2A, we found no correlation except with survival. However, the rate of clinical complete response (CCR) was higher in the 'Favourable' class than in the 'Unfavourable' class (82 vs 46%), suggesting that the prognostic value of the OPM might be partly related to some predictive value for response to chemotherapy. Nevertheless, the OPM seemed to have a prognostic value within the subset of patients with CCR. In this subset, 'Favourable' cases had a 2-year and a 5-year PFS of 55% and 38%, respectively, vs 0% in the 'Unfavourable' group (P ¼ 2.56E-05, log-rank test). Thus, our model was able to identify poor-prognosis cases among those presenting the same response after treatment, suggesting a prognostic value linked to disease natural evolution, independent from the response to chemotherapy.
Then, we confronted our OPM to classical prognostic factors regarding the association with PFS: age, grade, stage, taxane use, and surgical status (Table 3A). Univariate analysis showed that FIGO stage, surgical status, and the OPM were correlated with PFS. In multivariate analysis, the OPM remained significant (P ¼ 2.2E-03, Wald's test), together with the FIGO stage, while the surgical status lost its prognostic value.

External validation of the OPM
We sought to demonstrate the robustness and prognostic independence of our OPM in an independent validation set. We collected and pooled data from five recent prognostic studies of ovarian carcinomas, including 366 advanced stage tumours (FIGO stages III and IV). Their characteristics are resumed in Table 4. The clinical endpoint available through the five series was OS: the information (death or alive) was available for all 366 patients, with survival times mentioned for 262 patients. A total of 172 out of 366   women died from disease. For the 262 cases with reported survival times, 112 died and 150 remained alive with a median follow-up of 32 months (range 1 -166) and a 5-year OS equal to 39% (31 -49%). Applied to these samples, the seven-gene OPM defined a 'Favourable' class (n ¼ 187; 51%) and an 'Unfavourable' class (n ¼ 179; 49%) that strongly correlated with survival. The 'Favourable' class contained 121 of the 194 alive patients and the 'Unfavourable' class contained 106 of the 172 deceased patients (Se ¼ 61.6%, Sp ¼ 62.4%, odd ratio ¼ 2.7, 95% CI ¼ 1.7 -4.2; P ¼ 6.1E-06, Fisher's exact test), thus confirming the prognostic value of our OPM in a large and independent data set. In this validation set (Table 2B), patients classified as 'Favourable' were slightly younger (P ¼ 0.02) and had a better OS than 'Unfavourable' cases with 5-year OS of 57 vs 27%, respectively (P ¼ 1.56E-05; Figure 4A).
In this data set, univariate analysis for OS retained as significant the same features as in the learning set, that is, the OPM-based classification, as well as the classical FIGO stage and the amount of residual disease after surgery (Table 3B). In multivariate analysis, our seven-gene model and the FIGO stage remained significant, further underlining the robustness of the model and its capacity to predict clinical outcome independently of other clinical features and better than the residual disease after surgery. Indeed, OS was higher in the 'Favourable' cases than in the 'Unfavourable' cases, both in tumours without residual disease (5-year OS: 73 vs 30%; P ¼ 1.3E-03) and in tumours with residual disease after surgery (5-year OS: 43 vs 21%, P ¼ 1.9E-03; Figure 4B). Moreover 'Favourable' cases with residual disease after surgery displayed longer survival than tumours without residual disease but with an 'Unfavourable' profile (P ¼ 0.12), even if the difference was not significant.

DISCUSSION
Despite frequent initial chemosensitivity, the prognosis of advance-stage ovarian cancer is poor with a long-term OS of 25%. Classical prognostic criteria are insufficient to accurately predict the survival of an individual patient, and need to be improved. Response to chemotherapy is an imperfect prognostic factor as it correlates more with immediate clinical outcome than with long-term PFS and OS, which depend on additional factors such as the invasive potential and growth of the tumour. Early identification of the B80% of patients who will die from disease progression despite the initial response to standard treatment is crucial. It should help guide initial therapy by using experimental approaches such as novel first-line drugs, novel strategies such as intra-peritoneal chemotherapy or maintenance chemotherapy, or by using existing alternative chemotherapy regimens instead of the standard regimen. Some gene profiling studies have addressed the issue of survival prediction in ovarian cancer (for review, see Sabatier et al (2009)). In most of them, however, the validation of the multigene predictor was either absent or done on a relatively small validation set, inferior to 118 samples for the largest (Denkert et al, 2009).
Using whole-genome DNA microarrays, we profiled a unicentric series of 35 pre-treatment platinum-treated ovarian carcinomas, including a majority of advanced stages. Supervised analysis of gene expression data identified 94 genes whose expression was correlated with PFS, including genes involved in DNA repair (APEX1, WDR6, and PARP2) and apoptosis (CCAR1, CASP2, IKBKB, and PDCD6IP), or known to be associated with malignancy (S100A8, FNTA, and CLUAP1). It is of note that most of these 94 genes were not present in previously published signatures. This discrepancy between prognostic gene signatures identified using high-throughput technologies has already been reported in several cancers, notably breast (Bertucci et al, 2006) and ovarian (Sabatier et al, 2009) cancers. It can be explained not only by the technological and methodological differences between these studies, but also by the relatively small size of populations analysed and the patients' heterogeneity, both in term of clinical and pathological features definitions. In this context, the validation of a signature in large and independent series is crucial to confirm its robustness. From this 94-gene list, we established a seven-gene OPM able to classify, independently from classical prognostic features, our samples in two classes with different clinical outcome: a 'Favourable' with 5-year OS of 56%, and an 'Unfavourable' class with 5-year OS of 10%. Importantly, when applied to a large independent validation set (N ¼ 366), which represents so far the largest one reported in the literature, our model maintained its strong and independent prognostic value in multivariate analysis with 5-year OS of 57% in the 'Favourable' class and 27% in the 'Unfavourable' class.
Whether our model reflects the chemosensitivity of the tumour and/or its metastatic and proliferative potential cannot be determined, but interestingly, it remained predictive of survival when applied to the homogeneous respective groups of patients with CCR to chemotherapy, suggesting it is partly independent of chemosensitivity. The two genes of the model associated with poor prognosis; A1BG and PAH, are known to be involved in cancer and particularly in ovarian neoplasm development. Phenylalanine hydroxylase concentrations were higher in patients with advanced-stage disease (Neurauter et al, 2008) and correlated with concentrations of immune markers (tumour necrosis factor-a receptor and neopterin). Alpha-1 b glycoprotein (A1BG), a secreted protein of unknown function, was underexpressed in urines of bladder cancer patients (Kreunin et al, 2007). It presents several similarities with its opossum homologue, Oprin (Catanese and Kress, 1992). Oprin has a metalloproteinase inhibitor function and is similar to TIMP (tissue inhibitor of metalloproteinase), which can have a role in angiogenesis, cellular proliferation, and tumour progression (Chirco et al, 2006). Expression of A1BG is also stronger in pancreatic cancer than in normal pancreas (Tian et al, 2008). Three of the five genes of our model correlated with good prognosis are implicated in oncogenesis. TMPRSS3 is overexpressed in pancreatic cancers when compared with normal pancreatic and pancreatitis tissues (Wallrapp et al, 2000). It is overexpressed in ovarian carcinomas as compared with non-malignant tissues (Sawasaki et al, 2004), but its prognostic value has never been evaluated. ALCAM protein has been associated with prognosis in melanoma (van Kilsdonk et al, 2008), ovarian (Mezzanzanica et al, 2008), breast (Ihnen et al, 2010), prostate (Kristiansen et al, 2005), colorectal (Weichert et al, 2004), and pancreatic (Kahlert et al, 2009) cancers. SLC7A2 expression is higher in oestrogen receptor (ER)positive breast tumours than in ER-negative ones (Tozlu et al, 2006).
In conclusion, we have developed, and validated in a large series of samples, a seven-gene model associated with survival of platinum-treated ovarian carcinoma patients. If further retrospective and prospective validation studies confirm its relevance, our model could help tailor the systemic treatment of advancedstage ovarian cancer. Based on their low likelihood of achieving prolonged survival with standard first-line platinum-based therapy, the 'Unfavourable' patients might be guided, at the time of diagnosis, towards investigational treatment approaches to be defined. Furthermore, a better understanding of the implication in ovarian oncogenesis of the genes present in our model might help develop alternative therapies.   (B) Similar to (A) with stratification based on the residual disease after surgery (Opt., optimal surgery; Sub-Opt., suboptimal surgery). P-values were assessed with the log-rank test.