Main

With an estimated 21 290 new cases in 2015 (Siegel et al, 2015), epithelial ovarian cancer (EOC) is a heterogeneous disease from morphological and molecular perspectives (Kobel et al, 2008; Cancer Genome Atlas Research Network, 2011). As the most common histologic type representing 70% of EOC cases, high-grade serous carcinoma (HGSC) has been extensively studied in terms of its molecular profile and genomic landscape in many prognostic studies. For other less common histological types (endometrioid carcinoma (EC), clear cell carcinoma (CCC), mucinous carcinoma (MC), and low-grade serous carcinoma (LGSC)), prognostic studies to understand disease mechanism and find corresponding treatment options are lacking.

Among previous studies investigating rare histological types of EOC, very few focused on molecular associations with patients’ prognosis, partially due to relatively small number of cases. In fact, only one published Australian expression array study incorporated LGSC and EC tumours to assess outcome (Tothill et al, 2008), but did not include CCC and MC. Differences between HGSC and other histologies has also been observed at genomic and epigenomic levels (Huang et al, 2012; Cicek et al, 2013), and several inherited susceptibility regions of EOC specific to HGSC (Pharoah et al, 2013; Shen et al, 2013). Compared with HGSC studies with several validated outcome-associated signatures (Cancer Genome Atlas Research Network, 2011; Verhaak et al, 2013; Riester et al, 2014), there is a strong need to better understand the transcriptome of rare histological types from a disease outcome perspective.

In this study, we characterised associations between expression profiling of 131 EC, CCC, MC, and LGSC EOC cases and progression-free survival (PFS). We hypothesised the existence of expression signature predictive of PFS that is shared across EOC of rare histological types, motivated by several studies showing that different cancer types could share similar transcriptome features with therapeutic potential (Cancer Genome Atlas Research Network, 2012; Martinez et al, 2014). We performed a semi-supervised gene expression clustering analysis and revealed two underlying transcriptome classes in the rare EOC histological types, which were associated with differential outcome in both univariate and multivariate analysis. In addition, we conducted validation analyses in public data sets that contained expression measurements on the rare histological types, confirming existence of the two discovered transcriptome classes and significant PFS association. Noticeably, resulting classes were consistently predictive of PFS in rare histological tumours but not in HGSC tumours for the Mayo Clinic and public data sets (for example, ‘Class-1’ cases had better outcome compared with ‘Class-2’ cases). Through pathway enrichment analysis, we also found that ‘Class-1’ tumours have more active metabolic activities producing steroid hormones and also more enriched with WNT signalling pathway, while the ‘Class-2’ tumours are associated with the upregulation of cell cycle signalling pathway and toll-like receptor (TLR) signalling pathway.

Materials and Methods

Mayo Clinic study participants and expression profiling

Eligible cases (n=131) were women aged 20 years or above who were ascertained between 1992 and 2009 at the Mayo Clinic with pathologically confirmed rarer histological types of invasive EOC (73 EC, 39 CCC, 14 MC, and 5 LGSC). Initial clinical diagnoses were confirmed by a gynaecologic pathologist (GLK), who verified histology and tumour grade and reviewed each tissue to ensure 70% tumour content prior to RNA extraction. Progression and vital status were obtained from the Mayo Clinic Tumor Registry, electronic medical records, and active patient contact. All cases provided informed consent for use of their tissues and medical records in research; all protocols were approved by the Mayo Clinic Institutional Review Board. Additional details on study participants have been described elsewhere (Cicek et al, 2013).

PFS time was defined as time from the date of diagnosis to the date that second-line therapy was initiated for a clinically-actionable tumour recurrence, accounting for date of study entry (left truncation). Clinical characteristics examined as covariates included histology (EC, CCC, MC, LGSC), stage (I, II, III/IV), grade (low, high), surgical debulking status (no macroscopic disease, others), age (<50, 50–59, 60–69, 70–79, 80+), body mass index at diagnosis, pre-surgical CA125, and ascites (yes, no). Cox regression was used to estimate hazard ratios and 95% confidence intervals, including multivariate stepwise variable selection. Univariate analysis of clinical features revealed that histology, grade, stage, and surgical debulking status significantly associated with PFS with P-value <0.05. Following stepwise variable selection for the multivariate analysis, only stage and surgical debulking status remained significantly associated with PFS. Clinical characteristics, along with univariate and multivariate associations with PFS are shown in Table 1.

Table 1 Clinical characteristics and association with PFS for 131 Mayo Clinic EOC patients with endometrioid, clear cell, mucinous, or low-grade serous tumours

RNA from fresh frozen tumours of each patient was extracted and assessed using Agilent Whole Human Genome 4 × 44 K Expression Arrays as previously described (Goode et al, 2013; Konecny et al, 2014). Batch effects were corrected to adjust Cy5, Cy3 labelling difference observed among experimental batches, using ‘ComBat’, an empirical Bayesian approach (Johnson et al, 2007). TCGA-based HGSC transcriptome subtypes were assigned to each tumour as described previously (Konecny et al, 2014).

Semi-supervised expression clustering

For Mayo Clinic internal discovery set (n=66), a semi-supervised clustering technique implemented in R ‘Superpc’ package (Bair and Tibshirani, 2004) was applied to normalised log-ratio expression (Supplementary Figure 1A). The ‘semi-supervised’ aspect of the analysis determined a reduced set of features (gene probe sets), expression levels of which were associated with PFS. Using the internal discovery set, Cox models were fit to each of the features separately to examine the association between expression level and PFS. The features were then ranked based on their strength of association with PFS, and the top expression probes were selected for subsequent clustering analysis using principal component analysis. The optimal number of expression probes was selected to be 960 using a 10-fold cross-validation procedure.

As implemented in the R ‘Superpc’ package, clustering was done by projecting a probe-by-sample expression data matrix of the selected probes in the first principal component direction, using singular value decomposition. To achieve a discrete group assignment, the median of the first principal component projection was used as a cutoff (Bair and Tibshirani, 2004). On the basis of the predictive projection generated using the discovery set, we further predicted transcriptome class memberships using centroid-based similarity score, as described in following section.

Expression centroid-based class similarity score

Based on derived class-predictive probes from semi-supervised clustering, we summarised expression centroid for Class-1 and Class-2 as the averaged expression vector for assigned samples in Mayo Clinic discovery set. Similar to breast cancer studies using an expression centroid to determine transcriptome class (Tibshirani et al, 2002; Parker et al, 2009), we defined a class similarity score as the Pearson’s correlation coefficient between expression centroid and corresponding signature gene expression of one test sample. For simplicity, we defined a differential correlation score for each tumour sample as DiffCorr1vs2=Pearson’s Correlation(sample, centroid1)–Pearson’s Correlation(sample, centroid2). Therefore, a tumour sample with DiffCorr1vs2>0 will be assigned as ‘Class-1’ membership, and <0 to ‘Class-2’.

For Mayo Clinic validation samples (N=65) having the same Agilent 4 × 44 k platform with discovery set, probe-level expressions were used to compute DiffCorr1vs2 and predict class memberships, according to probe-level centroids (Supplementary Table 1). For public data sets with expression measurements from different microarray platforms, gene-level expressions were used, with the overlapped genes between given sample and gene-level centroids (Supplementary Table 2).

Other validation data sets

Several validation sets were also used (Bonome et al, 2008; Tothill et al, 2008; Crijns et al, 2009; Denkert et al, 2009; Mok et al, 2009; Cancer Genome Atlas Research Network, 2011; Mateescu et al, 2011; Bentink et al, 2012; Ferriss et al, 2012; Pils et al, 2012; Yoshihara et al, 2012; Karlan et al, 2014). One Mayo Clinic HGSC data set consisted of additional 372 HGSC cases, in which 174 cases from a previous study for EOC patients in the Mayo Clinic (Konecny et al, 2014). Fifteen public expression data sets were retrieved from a curated ovarian cancer transcriptome database (Ganzfried et al, 2013), with organised clinical information, such as survival/progression time, histology, grade, stage, and debulking status. From the database, two clinical annotations ‘summarygrade’ (low-grade/high-grade) and ‘histological_type’ were used to determine expression samples of rare histological types. With a note, we used ‘grade’ 1 and ‘histological_type’ to determine LGSC samples according to previous studies (Ayhan et al, 2009; Vang et al, 2009). Expression samples of tumours with undetermined or other histological type (e.g., borderline) were excluded. Where multiple probe sets mapped a gene, the probe set with the highest mean across all data sets of the sample platform was utilised (Miller et al, 2011).

After using ‘Combat’ analysis to eliminate per-study batch effects across 15 data sets with at least 50 eligible samples, total 2460 EOC expression samples were used in the analysis, including 78 EC, 70 LGSC, 27 MC, 24 CCC, and 2261 HGSC. Out of 199 rare histological cases across 9 public data sets, 91 samples had recurrence information, and 57 samples had PFS, stage, and debulking information for univariate and multivariate associations, respectively. Out of 2261 HGSC cases across fifteen public data sets, 967 had recurrence information, and 595 samples had PFS, stage, and debulking information for univariate and multivariate associations, respectively. Information on the 15 studies included in the validation and summary of the clinical covariates are summarised in Supplementary Tables 3 and 4.

Results

Among the 131 Mayo Clinic invasive EOC patients studied, 37 (28.2%) experienced recurrent disease (Table 1). The internal discovery (n=66) and validation (n=65) sets were split randomly with a balance distribution in clinical characteristics (Supplementary Table 5, Supplementary Figure 1A). Two transcriptome classes were derived from the discovery set only (n=66), referred as Class-1 and Class-2. In the discovery set, cases in the Class-1 were associated with longer PFS and fewer recurrence events as compared with the Class-2 (Figure 1A and Table 1). Using a cross-validation procedure, 960 expression probe sets mappable to 705 genes were determined to be used in semi-supervised clustering and predictive of derived tumour classes (Supplementary Tables 1 and 2, Supplementary Figure 1B and C). For classifying unseen samples in internal and public validation sets, we assigned the class membership with more resembled expression centroid (seen in Materials and methods section).

Figure 1
figure 1

Rare histological EOC cases with PFS information in Mayo Clinic dataset. (A) For the samples in Mayo Clinic internal discovery set (n=66), Kaplan-Meier plot of progression-free survival demonstrates a clear outcome difference between Class-1 and Class-2, with better outcome in Class-2 group. (B) For the samples in Mayo Clinic internal validation set (n=65), Kaplan–Meier plot of progression-free survival confirms the outcome difference between Class-1 and Class-2. Cyan and purple colours indicate recurrence curves of Class-1 and Class-2 cases, respectively.

In the Mayo Clinic validation set, Class-1 membership showed an independent contribution to predict better PFS, shown as Figure 1B and Table 2, with univariate analysis P=8.2 × 10−3 and multivariate analysis P=2.8 × 10−2 after adjusting for stage and debulking status. In public validation sets for samples with PFS information (n=91), the Class-1 membership was associated with better prognosis in univariate analysis (P=1.4 × 10−3), shown as Figure 2, and still marginally significant association with multivariate analysis (P=6.8 × 10−2, n=57). Univariate and multivariate analysis details were shown in Table 2. When examining clinical characteristics with predicted classes in internal validation set and public combined validation set, we found several consistent relationships with discovered two classes (Supplementary Table 3): Class-1 tumours were significantly enriched for patients with low-grade, early stage diseases, and had a lower proportion of patients with histology of CCC, and higher proportion of MC.

Table 2 Univariate and multivariate PFS associations in the Mayo Clinic and public data sets of endometrioid, clear cell, mucinous, or low-grade serous EOC patients
Figure 2
figure 2

For rare histological EOC cases with PFS information across five data sets ( n =91). (A) Kaplan–Meier plot of PFS showing the Class-1 and Class-2 outcome association for all the combined samples. (B) Per-study Kaplan–Meier plots. Cyan and purple colours indicate recurrence curves of predicted Class-1 and Class-2 cases, respectively.

In addition, when TCGA expression signatures of four molecular subtypes of HGSC EOC was applied to these samples, Class-1 tumours had higher proportion of patients with the TCGA-defined ‘Differentiated’ molecular subtype and lower proportion of ‘Immunoreactive’ subtype (Cancer Genome Atlas Research Network, 2011; Verhaak et al, 2013). In order to examine the relationship between expression classes and stromal contamination, we also evaluated association between class membership vs tumour content, which were evaluated by a gynaecologic pathologist (GLK), and found no significant association (P=0.14, Supplementary Table 3).

To investigate whether transcriptome classes from rarer histology samples predict PFS in HGSC, we also assigned Class-1/Class-2 memberships in Mayo HGSC data set and across 15 public data sets (Dressman et al, 2007; Wu et al, 2007; Bonome et al, 2008; Tothill et al, 2008; Crijns et al, 2009; Denkert et al, 2009; Mok et al, 2009; Yoshihara et al, 2010; Cancer Genome Atlas Research Network, 2011; Mateescu et al, 2011; Bentink et al, 2012; Ferriss et al, 2012; Pils et al, 2012; Yoshihara et al, 2012; Karlan et al, 2014). In Mayo HGSC data set (n=372), the class membership resulted from the expression signature was not associated with PFS (P=0.17) (Figure 3A). In 15 public data sets involving HGSC cases (n=967), Class-1/-2 membership was significant (P=1.9 × 10−2) in univariate PFS association (Figure 3B and C) but not significant in multivariate association (P=0.15) when adjusting for stage and debulking status (details seen in Table 3).

Figure 3
figure 3

Mayo Clinic and public HGS datasets with PFS information. (A) Kaplan–Meier plot of PFS showing the Class-1 and Class-2 outcome association for Mayo Clinic HGSC cases. (B) Kaplan–Meier plot for all the HGSC cases with PFS information in public data sets (n=1120). (C) Per-study Kaplan–Meier plots. Cyan and purple colours indicate recurrence curves of predicted Class-1 and Class-2 cases, respectively.

Table 3 Univariate and multivariate PFS associations in the Mayo Clinic and public data sets of high-grade serous EOC patients

To identify pathways differentially enriched between two transcriptome classes in entire Mayo rare histology cohort, we first chose differentially expressed genes based on statistical confidence and substantial fold changes (false discovery rate (FDR) <1% and absolute log2 fold-change>0.5), resulting to 965 and 713 genes upregulated in Class-1 and Class-2, respectively (Supplementary Figure 2 and Supplementary Table 6). Then, we performed KEGG pathway enrichment analysis of these genes using DAVID online annotation tool (http://david.abcc.ncifcrf.gov/) (Huang da et al, 2009a, 2009b). Significantly enriched pathways with FDR <20% are presented in Supplementary Table 7. For upregulated genes in Class-1, the most enriched pathways included ‘hsa00140: Steroid hormone biosynthesis’ (FDR=0.005%), which may suggest differential metabolite activities producing hormones between Class-1 and Class-2. Noticeably, Class-2 upregulated genes were highly enriched in cell cycle activities: ‘hsa04110: Cell cycle’ (FDR=0.86%), consistent with more rapid recurrence in Class-2 patients. Class-2 upregulated genes were also found enriched in another immune-related ‘hsa04620: Toll-like receptor signalling pathway’ (FDR=2.37%). Clinical covariates and predicted membership information for all the public expression samples used in this study (n=2460) are presented in Supplementary Table 8.

Discussion

Historically, EOC has been classified according to patterns of abnormal differentiation and morphology, as serous (fallopian tube-like), endometrioid (endometrium-like), mucinous (endocervical-like), and clear cell (mesonephros-like) (Auersperg et al, 2001), with serous histology samples further classified into low- and high-grade categories (Vang et al, 2009). In addition, some researchers have suggested that EOC histological types could be collapsed as two types: the so-called type-II refers to the HGSC type arising from the fallopian tube, and all the remaining types (i.e., EC, CCC, MC, and LGSC) belonging to type-I (Kurman et al, 2008). Meanwhile, CCC, MC, and EC tumours present with differing clinical characteristics; CCC tumours are usually of advanced stage, and MC tumours are often low grade and diagnosed at an early stage. EC is a very diverse histology type by itself, while lower grade usually associated with better outcome, high-grade EC cases were often reported with TP53 mutations and genome instability, resembling HGSC disease (Prat, 2012).

Despite established histological classification of EOC, huge challenges reside in clinical practices in that very few therapeutic options are available for treating women with EOC, and treatment is not specific to histological type. Therefore, revealing prognostic tumour-based molecular information is critical to understand and potentially find treatment solutions for EOC patients. As the most common EOC histological type, HGSC has received much research attention including by the TCGA (Cancer Genome Atlas Research Network, 2011), which resulted in the discovery of four expression subtypes with different pathways activated, that are prognostic (Konecny et al, 2014), and may lead to different potential therapeutic targets (Liu and Matulonis, 2014; Secord et al, 2014). Inspired by the successful example in HGSC, we investigated the existence of prognostic tumour expression classes in collections of rarer non-HGSC EOC.

In this expression study, we investigated the utility of using transcriptome classes as markers to define EC, MC, CCC, and LGSC EOC tumours with different progression risk. Our semi-supervised clustering analysis on whole-genome gene expression data identified two prognosis classes derived from a discovery set and replicated in a validation set of rare histological EOC tumours seen at the Mayo Clinic. The association with PFS of derived classes was statistically significant even after controlling for the covariates that contribute to PFS in multivariate models (stage and debulking status); as a contrast, histology was not predictive of PFS in multivariate model as shown in Table 1, underscoring the needs to better understand molecular mechanisms. With expression signature derived from Mayo Clinic discovery set, we externally validated the existence of two classes in nine public expression data sets with rare histological samples, and confirmed Class-1 was associated with better PFS in subset of samples with PFS information. Comparing with established clinical factors stage and debulking status, Class-1/Class-2 membership provided additional prognostic value, shown in Table 2. These PFS associations were not suggested in analysis of either Mayo Clinic or public HGSC cohorts after controlling for stage and debulking (Table 3).

According to PFS analysis of Mayo Clinic patients, stage and surgical debulking as established factors affecting recurrence achieve association with high significance (debulking status: univariate P-value=0.0001). This underscores the importance of early detection of EOC. Also, an aggressive surgical effort leading to no remaining macroscopic disease, whenever possible, is critical to reduce the risk of tumour progression. Tumour grade was a factor significantly associated with PFS in univariate analysis but lost its significance in multivariate analysis which can likely be explained by the correlation of grade and stage (65.5% of low-grade rare histological cancers were diagnosed at stage I, Fisher’s exact test P=3.9 × 10−2). Similarly, histology alone was predictive of PFS, but did not provide additional prediction towards PFS beyond stage and grade information. In contrast, our rare subtype transcriptome memberships significantly predicted PFS outcome after accounting for stage and grade in Mayo Clinic patients, as well as in the public data sets.

Compared with previous non-HGSC EOC studies (Tothill et al, 2008), our study represents the largest and most comprehensive collection with the greatest number of histological types and external validations. Also, instead of combining HGSC and non-HGSC types (Tothill et al, 2008), we performed semi-supervised clustering only in non-HGSC types, and we investigated resulting class in HGSC and non-HGSC tumours separately. The advantage of this approach is that the predominance of HGSC did not impact clustering. The distinctly different PFS associations for Class-1/Class-2 membership in patients with HGSC and non-HGSC tumours suggest that they should be separately studied in the future.

Through pathway enrichment analysis after differential expression analysis, we also highlighted different biological pathways behind each class of tumours. Class-1 tumours were associated with more active hormone activities, reflected by enrichment of ‘Steroid hormone biosynthesis’ pathway and ‘Metabolism of xenobiotics by cytochrome P450’. Progesterone and oestrogen are steroid hormones regulating normal menopause cycle, and have been studied for potential roles in ovarian cancer aetiologies and prognosis (Lukanova and Kaaks, 2005; Sieh et al, 2013). Noticeably, progesterone receptor (PR) was significantly upregulated (Class-1 vs -2 log2 ratio foldchange=2.2, t-test FDR=1.36 × 10−11, Supplementary Table 6). The known protective effects of progesterone may contribute to the less aggressive progression in Class-1 tumours. This observation was also reported by the Ovarian Tumour Tissue Analysis Consortium (Sieh et al, 2013), in which they found an association between high immunohistochemistry based protein expression PR measurements and improved disease-specific survival in EC (log-rank P<0.0001) and HGSC (log-rank P=0.0006). The other enriched pathway in ‘Class-1’ upregulated genes is WNT signalling pathway. Although frequent mutations of pathway members were only expected for EC, WNT pathway was found implicated in other ovarian histological types and therefore has been studied for potential target treatments (Arend et al, 2013, 2014).

As a contrast, genes upregulated in Class-2 were associated with noticeably active cell cycle activities with several cell cycle regulator genes, including cyclin E1 (Class-2 vs −1 log2 ratio foldchange=1.0, t-test FDR=3.26 × 10−7), which is a gene found frequently amplified independent of BRCA1/2 mutations, and associated with primary treatment resistance in HGSC (Nakayama et al, 2010; Etemadmoghadam et al, 2013). Another enriched pathway associated with Class-2 is TLR pathway, the signalling pathway of which in tumour cells may result in immunosuppression and thereby furthering tumour growth (Muccioli and Benencia, 2014). With summarised pathway interpretations, we can possibly name ‘Class-1’ as ‘hormone-WNT’ class and ‘Class-2’ as ‘cyclin-TLR’ class, call for future studies focusing on class-specific pathway aberrations, and investigate treating rarer histological of ovarian tumours according to discovered classes.

In conclusion, this comprehensive study revealed the existence of two tumour transcriptome classes among EC, MC, CCC, and LGSC EOC and found that transcriptome classes associated with PFS. Results in Mayo Clinic cases were validated in non-HGSC cases of public rarer histological expression samples, but not in either Mayo Clinic HGSC or public HGSC EOC patients, suggesting discovered classes are unique to rarer histological EOC. Pathway enrichment analysis further showed that differentially upregulated genes in Class-1 and Class-2 appeared to be associated with distinct molecular pathways. Future work is needed to validate current findings in even larger non-HGSC EOC collections and to consolidate pathway mechanisms of the revealed transcriptome classes for investigating therapeutic potentials.