Main

The majority of women diagnosed with ovarian carcinoma who are treated according to standard protocols will die from their disease,1 and there has been minimal improvement in outcome over the past two decades.2 There has been progress in refining the morphological criteria for subtyping of ovarian carcinomas, and diagnostic reproducibility has increased recently, reaching an excellent level of reproducibility.3, 4 The five major subtypes of ovarian carcinoma (high-grade serous, clear cell, endometrioid, mucinous, and low-grade serous) that can be identified based on morphological criteria differ with respect to genetic risk factors, precursor lesions, molecular alterations, stage at presentation, and clinical behavior,3, 5, 6, 7 and are best considered to be distinct disease entities.8 Because of the differences in chemosensitivity between ovarian carcinoma subtypes, there has been a call for subtype-specific clinical trials, to identify more effective therapies for those subtypes (ie clear cell, mucinous) resistant to conventional platinum/taxane chemotherapy.9, 10, 11

It has recently been shown that neoadjuvant chemotherapy followed by debulking surgery yields similar outcome results to chemotherapy administered after surgery for patients with advanced stage ovarian carcinoma, with the advantage of decreased morbidity, reduced surgical procedural time, and more rapid post-operative recovery.12 Neoadjuvant chemotherapy is effective for high-grade serous carcinomas but not for others.9, 10 If neoadjuvant chemotherapy is administered only to specific ovarian carcinoma subtypes, it raises questions about how reliably subtype can be diagnosed on limited samples, such as core biopsy or fine needle aspiration biopsy samples. Although excellent reproducibility is possible when diagnosis is based on examination of multiple slides from a well-sampled tumor,4 there is no evidence that similar reproducibility is possible based on a small tumor sample. We have recently shown that the major ovarian carcinoma subtypes significantly differ with respect to their biomarker expression profile,8 suggesting that biomarker expression can be of use in subtype diagnosis. We thus searched for the smallest panel of immunohistochemical biomarkers that showed highest sensitivity and specificity for diagnosis of the five major subtypes of ovarian carcinoma.

Materials and methods

Ethics Statement

Approval for the use of these cohorts for this study was obtained from the Research Ethics Board of the British Columbia Cancer Agency and University of British Columbia.

Study Cohorts and Inclusion Criteria

This study utilized three independent cohorts, two for the generation of prediction equations and one for the validation of those prediction equations (Figure 1).

Figure 1
figure 1

Experimental design for the study detailing independent discovery of the immunohistochemical marker panel from two unrelated cohorts, prediction accuracy represented by the κ statistic (κ), and internal validation.

The first cohort, hereafter referred to as the archival cohort, contained a total of 500 formalin-fixed paraffin-embedded tissues from ovarian carcinoma from a previously described retrospective population-based cohort from British Columbia diagnosed between 1984 and 2000. These samples were from >20 hospitals in British Columbia, with no standardization of specimen fixation or processing. The primary eligibility criterion was the diagnosis of chemonaive ovarian carcinoma, with surgical debulking leading to patients being free of macroscopic residual disease after primary cytoreductive surgery. As a result of this case definition, there were a relatively large number of non-serous carcinomas, compared to what would be expected from a series including all patients with ovarian carcinoma.6

The second group, hereafter referred to as the tumor bank cohort, consisted of a single hospital-based set of cases from the Gynaecologic Tissue Bank at Vancouver General Hospital and consisted of 292 samples from patients diagnosed with ovarian carcinoma between 2001 and 2008. These samples represent high-quality tissue with short devitalization times and standardized fixation and tissue processing.

A third cohort, hereafter referred to as the validation cohort, was assembled from consecutively diagnosed ovarian carcinoma cases seen at five centers in Canada, and included samples collected from 2006 to 2009 that were part of a recent histomorphological study.4

Tumor Classification

All histological slides underwent pathological review (CBG and MK) and were assigned to one of the five major subtypes (high-grade serous, clear cell, endometrioid, mucinous, low-grade serous) or other (including mixed carcinomas and rare types such as undifferentiated, squamous, malignant Brenner tumor, unclassified) according to modified WHO criteria, as recently described.3, 7 Carcinomas with a transitional growth pattern on low-power examination, but other features of high-grade serous carcinoma, for example, slit-like spaces or microcystic pattern, were classified as a variant of high-grade serous carcinoma ie high-grade serous carcinoma (with transitional features). All cases diagnosed as ‘other’ (ie not one of the five main subtypes of ovarian carcinoma) were excluded from this study. Furthermore, inter-observer agreement for the histological subtype diagnosis between the two review pathologists was required. Cases with disagreement were excluded (N=98 of 873 cases from the three case series (11.2%)). Finally, only cases where complete data were available for all immunostains were subjected to statistical analysis (N=138 of 775 (17.8%) did not have complete data for all 22 markers and were excluded).

Marker Selection

We searched for biomarkers that are differentially expressed between histological subtypes. The ideal characteristics of these candidate markers for the panel included: availability of a commercial antibody suitable for immunohistochemistry on formalin-fixed, paraffin-embedded tissue, as well as ease of use and interpretation. From our previous studies, we selected 11 markers: Wilms Tumor 1 (WT1), tumor protein p53 (TP53), hepatocyte nuclear factor 1-β (HNF1B), estrogen receptor (ESR), progesterone receptor (PGR), vimentin (VIM), epithelial cell adhesion molecule (EPCAM), mesothelin (MSLN), insulin-like growth factor 2 mRNA binding protein 3, IMP3 (IGF2BP3), cadherin 6, type 2, K-cadherin (CDH6), cyclin-dependent kinase inhibitor 1A (p21, Cip1) (CDKN1A)8, 13, 14 and added eight markers reported as differentially expressed in the literature: cyclin-dependent kinase inhibitor 2A (p16) (CDKN2A), mouse double minute 2 (MDM2), catenin (cadherin-associated protein), beta 1, 88 kDa (CTNNB1), phosphatase and tensin homolog (PTEN), mucin 5AC, oligomeric mucus/gel-forming (MUC5AC), paired box 2 (PAX2), secretoglobin, family 1A, member 1 (SCGB1A), CD44 molecule, splice variant 6 (CD44v6).15, 16, 17, 18, 19, 20, 21 Three additional markers: growth differentiation factor 15 (GDF15), trefoil factor 3 (intestinal) (TFF3), dickkopf homolog 1 (DKK1) were derived from comprehensive gene expression profiling data generated using Human Exonic Evidence Based Oligonucleotide microarray (Stanford, CA, USA) or whole transcriptome sequencing, as recently described.22, 23

Immunohistochemistry

Three tissue microarrays were constructed, as described previously13 from representative tumor areas using a tissue arrayer (Beecher Instruments, Silver Spring, MD, USA and Pathology Devices, Westminster, MD, USA). Two cores of 0.6 mm diameter were taken from each donor block and transferred to the recipient block.

Sections of 4 μm thickness were cut and stained within 2 weeks after sectioning. Details of antibodies from the final set of nine markers, and staining procedures, are listed in Table 1. Independently, two pathologists (MK or BG) visually scored these biomarkers from digitalized images (BLISS scanner, Bacus Laboratories/Olympus America, Lombard, IL, USA). All biomarkers were scored as positive or negative, using a cutoff of >1% of cells staining positively (Figure 2), except for CDKN2A for which positive staining was defined as >75% of tumor cells showing strong cytoplasmic and nuclear staining, as defined previously,24 and TP53, where a three-tier scoring system was used (complete absence—score 0, 1–50%—score 1, and >50%—score 2) because of evidence that scores of 2 and 0 both correlate with underlying mutations, while a score of 1 correlates with wild-type TP5325 (Figure 3). Less than 5% of cases showed discrepant results between the two scoring pathologists, and for these cases the higher score was used.

Table 1 Final antibody list and staining procedures
Figure 2
figure 2

Representative immunostains for DKK1, HNF1B, MDM2, PGR, TFF3, VIM, WT1. Cutoffs between negative (score 0) and positive (score 1) is indicated by the bold vertical line.

Figure 3
figure 3

Representative immunostains for CDKN2A and TP53. Cutoffs are indicated with bold lines. For CDKN2A, score 0 is negative or weak positive staining, while strong positive staining is score 1. TP53 was scored using a three part scoring system: 0-negative, 1-weak positive, 2-strong positive. For details of scoring see Materials and Methods.

Statistical Analysis

Contingency analysis and Pearson's χ2 statistic (with unadjusted P-values) were used to test for expression heterogeneity for each biomarker between the archival and tumor bank cohorts within each subtype. Full model fit nominal logistic regression modeling, utilizing a manual, iterative backwards elimination process, was used to generate prediction equations starting with the full panel of 22 markers. The criterion for the exclusion of a particular marker was based on the highest P-value in the effect likelihood ratio test.26 For the model predictions, a receiver operator characteristic area under the curve (AUC) >0.95 was defined as a required result for each histological type for the purpose of this study.

Validation of the prediction equations was accomplished by the application of the final immunohistochemical panel to the validation cohort in order to generate predicted histological subtypes, which would be compared to the consensus histopathological subtype assignment. The subtype considered most likely by the model, even if the probability was <50%, was considered to be the subtype prediction for that case by the model.

In order to address an inherent weakness we identified in the design of this study arising from the possible lack of reliability of prediction equations derived from such a small number of low-grade serous carcinomas in both the archival (N=5) and the tumor bank (N=7) cohorts, we subjected 10 serous borderline tumors from the tumor bank at Vancouver General Hospital to both prediction equations. SEK was responsible for the generation of all statistical analyses and prediction equations. For all analyses, P<0.05 was considered statistically significant. All statistical analyses were performed using JMP v 8.0.1 (SAS Institute, Cary, NC, USA). Our study is fully compliant with the STARD reporting guidelines that apply to this study.27

Results

Basic clinical parameters and morphologically assessed subtype diagnoses for the archival, tumor bank, and validation sets are depicted in Table 2. Due to the requirement of an interpretable staining result for all 22 immunomarkers, the final numbers for the prediction equation generation were 314 for the archival cohort and 242 for the tumor bank cohort. There was no difference between the included and excluded cases, based on tumor subtype frequency.

Table 2 Study populations

For both the archival and tumor bank cohorts, the original panel of 22 markers, which were differentially expressed across the subtypes of ovarian carcinoma, was subjected to a full model fit, stepwise nominal logistic regression utilizing backward elimination. The stopping criterion for the backward elimination process was ensuring that the receiver operating characteristic area under the curve (ROC AUC) was ≥0.95 for each subtype. This iterative process yielded the same panel of nine markers independently for both the archival and tumor bank cohorts (Figure 1). Both equations were used to predict histological subtype in their respective cohorts utilizing sensitivity, specificity, the receiver operating characteristic area under the curve (ROC AUC), and Cohen's κ coefficient (κ). Overall, the equations did an excellent job predicting the consensus subtype assignment for both the archival (κ=0.88±0.02) and tumor bank (κ=0.86±0.04) cohorts (Tables 3 and 4).

Table 3 Quality of archival prediction equation for the archival case series
Table 4 Quality of tumor bank prediction equation for the tumor bank case series

The accuracy of the low-grade serous prediction was tested by the application of both prediction equations to a set of 10 serous borderline tumors, and showed that the archival prediction equation classified 9 of 10 cases as low-grade serous with one being classified as high-grade serous, while the tumor bank equation classified 1 of 10 cases as low-grade serous carcinoma, 3 of 10 as high-grade serous, and 6 of 10 cases as endometrioid. These results led us to re-derive the tumor bank prediction equation and generate a new prediction equation for this cohort based on only four subtypes (serous, clear cell, endometrioid, mucinous) ie not trying to distinguish between low-grade serous and high-grade serous. This had little impact on the accuracy of the prediction for the remaining four subtypes (Table 5).

Table 5 Quality of revised tumor bank prediction equation for the tumor bank case series

The models were then cross-validated against the other cohort (Table 6). In order to investigate the difference between the prediction equations for the archival and tumor bank cohorts, we performed histological subtype-specific tests of heterogeneity for each marker across the test and validation cohort. This showed a trend toward reduced antigenicity in the archival cohort compared to the clinical cohort and is exemplified best by HNF1B staining frequencies in the endometrioid, clear cell, and mucinous subtypes (Table 7). Conversely, PGR in the high-grade serous subtype displayed different staining frequencies in the opposite direction with 35.8 and 19.1% of cases in the archival and tumor bank cohorts expressing PGR, respectively (P<0.001)

Table 6 Comparison of the archival and tumor bank equations on cross-validation
Table 7 Marker expression across subtypes expressed as percent positive

Validation was accomplished by the application of both the archival and tumor bank prediction equations to the validation set (Figure 1). The archival equation yielded a κ=0.85±0.06 and the revised tumor bank equation yielded a κ=0.78±0.07 (Table 8). The archival equation had poor sensitivity for the low-grade serous (50%) and mucinous (66.7%) subtypes, possibly due to the low numbers of these types in the validation cohort (N=2 and N=3, respectively). Overall, the archival prediction equation misclassified five cases; the diagnostic probability according to the model for these cases was of 58.8% (s.d. 9.0) compared to 90.8% (s.d. 14.3) for the correctly classified cases. For 80 cases where the probability of subtype assignment was ≥80%, 68 of 80 (85%) were correctly classified by the tumor bank model. For the archival model, 64 cases had a subtype prediction probability of ≥80% with 60 of 64 (94%) being correctly classified (Figure 4).

Table 8 Quality of archival and tumor bank prediction equations for validation case series
Figure 4
figure 4

Logistic regression graphs indicating increasing proportion of correct clinical diagnoses with increased predicted probabilities. (a) Archival prediction on the archival cohort, (b) archival prediction on the tumor bank cohort, (c) tumor bank prediction on the archival cohort, (d) tumor bank prediction on the tumor bank cohort, (e) tumor bank prediction on the validation cohort, (f) archival prediction on the validation cohort.

We examined the prediction profiles for the excluded cases of undifferentiated and unclassified carcinomas. Both prediction equations classified 9 of 9 cases of undifferentiated carcinoma as high-grade serous. The five cases of unclassified carcinoma were predicted as high-grade serous (3) and mucinous (2) by the archival prediction equation and endometrioid (2), high-grade serous (1), and mucinous (2) by the tumor bank prediction equation. All four cases of serous carcinoma with transitional features (as defined in the Materials and methods section) were classified as high-grade serous by both prediction equations.

Discussion

In this study, we developed an immunomarker panel that can accurately predict ovarian carcinoma subtype. This marker panel can be easily applied in practice, providing objective molecular-based support for classification of ovarian carcinomas, enhancing traditional histopathology-based classification.

Ovarian carcinoma types differ with respect to precursor lesions, clinical behavior, molecular alterations, and biomarker expression.7, 8 As for breast carcinomas, where immunomarkers contribute to accurate diagnosis of tumor subtypes and can be used to guide therapeutic decision-making,28 tumor type-specific management has been also suggested for ovarian carcinoma.11 For example, in women diagnosed with mucinous or clear cell carcinomas, chemotherapy might be ineffective, whereas, women with advanced high-grade serous carcinomas might benefit from neoadjuvant chemotherapy.9, 10 The integration of molecular findings into the classification of ovarian carcinomas has the potential to enhance reproducibility and accuracy of diagnosis, which will become increasingly important in the future. For example, with neoadjuvant therapy becoming a treatment option for advanced stage ovarian carcinoma12 accurate typing on core biopsies or cytology specimens prior to treatment will be essential.

Strengths of the study design include the use of two large, independent cohorts, one of which had a larger proportion of minority subtypes (clear cell, endometrioid, and mucinous); the same nine marker panel was derived from 22 candidate makers independently in these two cohorts. As well, a third, independent validation cohort, consisting of cases representative of the frequency of subtypes seen in practice, was used.4, 6 The morphological ‘gold standard’, which was achieved by the selection of cases in which two expert gynecological pathologists independently agreed on subtype diagnosis, ensured that the diagnoses of the cases in all three case series was uniform and as accurate as possible. Further, in an effort to ensure reproducibility, we restricted our IHC candidate markers to those where commercially available antibodies and simple scoring systems could be used.

Limitations of this study are the fact that the archival and tumor bank cohorts are fundamentally different in terms of tissue handling and fixation. This was the main reason we decided to keep the cohorts separate. As shown in Table 7, there were instances of reduced staining in the older archival cohort compared to the more recent tumor bank cohort. Whether the discordant staining results are caused by the age of the specimens, the tissue handling and fixation, or a combination of the two is impossible to determine with our materials. As one example, HNF1B showed was expressed in 22% more endometrioid carcinomas from the tumor bank cohort compared to the archival cohort and this trend towards increased frequency of expression was seen across all the subtypes in the tumor bank cohort. Logistic regression with sample age revealed that this reduced expression in the archival series becomes more pronounced as a function of specimen age, suggesting antigen degradation as the cause. With attention to tissue handling and advances in antigen retrieval techniques, immunohistochemistry has become a more robust technique that has been shown in quality assurance studies to deliver consistent results in diagnostic laboratories.29 This will particularly be true for small samples such as core biopsies or cytology specimens, which will be required as part of the triage for neoadjuvant therapy in patients with advanced stage disease, and the tumor bank prediction equation, may prove more accurate for those samples.

Another limitation of this study is the lack of ability of the tumor bank model to distinguish between low-grade serous and high-grade serous carcinoma. Low-grade serous carcinomas account for 3% of ovarian carcinomas,6 making it difficult to accrue sufficient cases for development of the model. Low-grade serous carcinomas arise from serous borderline tumors in many cases, rather than serous tubal intra-epithelial carcinoma, are not associated with germ line or somatic BRCA1/2 mutations, or TP53 mutations, frequently have BRAF or KRAS mutations, and do not respond as well to platinum-based chemotherapy as high-grade serous carcinomas.5 Although low-grade and high-grade serous carcinomas can be reproducibly diagnosed when large tissue samples are available for histopathological examination, distinction between low-grade and high-grade serous carcinoma on small samples is problematic, and further work must be done to develop biomarkers to aid in this particular differential diagnosis.

Immunohistochemistry has several advantages over other molecular tests; it is readily available in pathology laboratories, can be implemented as part of the normal diagnostic routine, with rapid turn around time, and is subject to established quality management programs. The interpretation by pathologists in morphological context avoids contradictory results between standard histopathology assessment and independent molecular tests and retains the advantage of assessing the protein expression in the context of exact localization, eg due to the extensive stromal expression VIM would not be a useful marker when assessed at the mRNA level. Immunohistochemical marker panels have been suggested as surrogate markers for expression profiling in subtyping of breast cancer.30 The complexity of data generated through use of panels of immunostains requires development of new approaches to interpretation. Nine immunomarkers, with the scoring scheme specified (eight 2-level variables and one 3-level variable), create 768 possible combinations for interpretation. Methods such as recursive partitioning can be used; however, a significant downside to that approach is the potential for overfitting31 while the nominal logistic regression model avoids overfitting and generates subtype-specific probabilities, which are easy to interpret. To allow pathologists and researchers the immediate use of our findings we launched a calculator of subtype probability (COSP) on our website. http://www.gpec.ubc.ca/index.php?content=papers/ovcasubtype.php (Figure 5). The IHC panel and COSP are designed to be used as a full set of data for all nine markers, as there is no way to reliably impute missing results if there is not a full set of immunostaining results. This approach does not take into account any subjective input from the user and is therefore unbiased. The COSP should currently be considered as a tool for research purposes (eg validation of historical diagnoses for a study group, ensuring comparability of cohorts). Translation for clinical use would require internal validation by institutions planning to apply the COSP to clinical material. Future plans for the tool include the development of an imputation algorithm for missing data and incorporation of new subtype-specific biomarkers. We feel that this resource will complement traditional histopathological classification, and is suitable for use in cases with equivocal morphological features, or where morphological classification is not possible with certainty because of small sample size.

Figure 5
figure 5

Screenshot of the web-based calculator of ovarian carcinoma subtype prediction (COSP). COSP can be used for what we have deemed to be ‘archival’ or ‘tumor bank’ grade material. It will be at the users discretion to determine which equation is most applicable to their specimen(s). COSP can be used in single mode or as batched mode in order to allow analysis of large retrospective tissue microarray cohorts with data uploaded in table file format.

Refined typing, reflecting the underlying tumor biology, and correlating with response to therapy, will contribute to improve the clinical management of women diagnosed with ovarian carcinoma.